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\0 • ABSTRACT 

Accurate photometric redshifts are among the key requirements for precision weak 
Q>^ . lensing measurements. Both the large size of the Sfoan Digital Sky Survey (SDSS) 

' and the existence of large spectroscopic redshift samples that are flux-limited beyond 

, its depth have made it the optimal data source for developing methods to properly 

■ calibrate photometric redshifts for lensing. Here, we focus on galaxy-galaxy lensing 

in a survey with spectroscopic lens redshifts, as in the SDSS. We develop statistics 
that quantify the effect of source redshift errors on the lensing calibration and on 
the weighting scheme, and show how they can be used in the presence of redshift 

i-j I failure and sampling variance. We then demonstrate their use with 2838 source galaxies 

with spectroscopy from DEEP2 and zCOSMOS, evaluating several public photometric 
redshift algorithms, in two cases including a full p{z) for each object, and find lensing 
calibration biases as low as < 1% (due to fortuitous cancellation of two types of bias) 
or as high as 20% for methods in active use (despite the small mean photoz bias of 
these algorithms) . Our work demonstrates that lensing-specific statistics must be used 
to reliably calibrate the lensing signal, due to asymmetric effects of (frequently non- 
Gaussian) photoz errors. We also demonstrate that large-scale structure (LSS) can 
strongly impact the photoz calibration and its error estimation, due to a correlation 
between the LSS and the photoz errors, and argue that at least two independent 
degree-scale spectroscopic samples are needed to suppress its effects. Given the size 
of our spectroscopic sample, we can reduce the galaxy-galaxy lensing calibration error 
well below current SDSS statistical errors. 
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* Based in part on observations undertaken at the European Southern Observatory (ESO) Very Large Telescope (VLT) under 

Large Program 175.A-0839. 
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1 INTRODUCTION 

Galaxy-galaxy lensing is the deflection of light from dis- 
tant source galaxies due to the matter in more nearby lens 
galaxies. In the weak regime, gravitational lensing induces 
0.1-10% level tangential shear distortions of the shapes of 
background galaxies around foreground galaxies, allowing 
direct measurement of the galaxy-matter correlation func- 
tion around galaxies. Due to the very small signal, typical 
measurements involve stacking thousands of lens galaxies to 
get an averaged lensing signal. 

Since the initial detections of galaxy-galaxy (g-g) lens- 



ing (Tvson et al.lll984l: iBrainerd et al" 



1998 



Fischer et al.l I2OO0I : ISmith et al 



19961: [Hudson et al 



200 ll : iMcKav et al 



200 ll ). it has been used to address a wide variety of astro- 



physical questions using data from numerous sources. These 
applications include (but are not limited to) determining 
the relation between stellar mass, luminosity , and halo mass 
to constrain models of galaxy formation (iHoekstra et al.l 
I2OO5I : iHevmans et"aLll2006al : iMandelbaum et al.ll2006cl ): un- 
derstanding the relation between halo mass from lensing 
and bias from g alaxy clustering to constrain cosmolog 
ical parameters jSheldon et al.l |2004 Seliak et al. 12005 



measuring galaxy density profiles |Ho_ekstra et alj 2004 : 
IMandelbaum ct al. 2006b) ; and understanding the extent of 
tid al stripping of the matter profiles of cluster sat ellite galax- 
ies iNataraian et al.|[20o3 : iLimousin et al.ll2007l '). In the fu- 
ture, galaxy-galaxy lensing will be used for geometrical tests 
that constrain the scale factor a( t ) and curvature Qk of 
the Universe Jjain fc Tavlorl l2003l : (Bernstein fc JainI |2004 
iBernsteinI I2OO6I ). As data continue to pour in, and future 
surveys are planned with even greater statistical power, the 
time has come to place galaxy-galaxy lensing on a firmer 
foundation by addressing systematics to greater precision. 

The g-g lensing signal calibration depends on 
several systematics, in cl uding th e calibration o f the 
shear (|Hevmans et al.1 l2006bl : iMassev et all 120071 ) 
and theoretic al uncertainties such as ga la xy intrinsic 
align m ents (|Agustsson fc Brainerd 20061: Altav et al.l 



20061: iHevmans et al.1 l2006d : 



Faltenbacher et all 120071 ') . 



IMandelbaum et al.l l2006bl : 

both areas in which there is 
significant ongoing work. Here, we focus on the proper cali- 
bration of the source redshift distribution for galaxy-galaxy 
lensing in the case where all lens redshifts are known. 
The SDSS has the rather unique capability of ofTering 
spectroscopic redshifts for all lenses, which both removes 
any calibration bias due to error in lens redshift estimation, 
and also allows us to compute the signal as a function of 
physical transverse (instead of angular) separation from the 
lenses, simplifying theoretical interpretation. While several 
theoretical studies have estimated th e effects of photoz 
errors for shear-shear autocorrelations (Hutcrcr et al. 200^: 
iMa et al..,2006. : Abdalla et al. 2007; Bernstein & Ma 2007), 
we present the first such analysis for galaxy-galaxy lensing, 
in which we not only offer statistics to use to evaluate 
the calibration bias, but also carry out an analysis with 
attention to practical issues such as sampling variance in 
the calibration sample. This work will therefore enable 
future g-g lensing analyses with other datasets to address 
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other scientific questions, and reveal potential issues with 
spectroscopic calibration of photoz's that are more general 
than just g-g lensing. We also address the extension of these 
techniques to galaxy-galaxy lensing without lens redshifts, 
and to cosmic shear, in Appendix 1X1 

Currently, there are two methods used for source red- 
shift determination in g-g lensing. The first is the use of an 
average redshift distribution for the sources. The primary 
difficulty with this method is finding a sample of galaxies 
with spectroscopy that has the same selection criteria as 
the source galaxies. Weak lensing requires well-determined 
shapes for each source, so a lensing source catalog is not 
purely fiux- limited, and literature estimates of dN/dz for 
flux-limited samples may not be appropriate (we show in 
this paper that for SDSS, the lensing-selected sample is at 
a higher mean redshift than the corresponding flux-limited 
sample at fixed magnitude). The solution is to find a spec- 
troscopic sample that overlaps the source sample and is at 
least as deep, using it to determine the redshift distribution 
using only lensing-selected galaxies in the spectroscopic sam- 
ple. For deeper lensing surveys, no such spectroscopic sam- 
ple exists. In other cases, it exists but may be quite small, 
with large uncertainty in dN/dz due to Poisson error and, 
more significantly, large-scale structure. The second diffi- 
culty is that without individual redshift estimates for each 
source, there is no way to remove sources that are physically- 
associated with lenses from the source sample, which can 
lead to dilution of the lensing signal by non-lensed galaxies 
(a systematic that is easily controlled) and, more signifi- 
cantly, signal suppression due t o in trinsic alignments [which 
cannot yet be easily contro lled (jAgustsson & Brainerd 200^; 
IMandelbaum et al.|[2006bh . and which can cause contamina- 
tion larger than the size of the statistical errors for small 
transverse separations] . 

The second method is to use broad-band photometry 
to measure photometric redshifts (photoz's) for each source 
galaxy. Photoz estimation exploits the fact that even with 
broad passbands, we can still learn enough about the spec- 
tral energy distribution to estimate the redshift. While pho- 
toz estimation that yields accurate values over a wide range 
of redshifts for all galaxy types is dif ficult, there have been 
several recent suc cesses in this field (|Feldmann et al ] |2006l : 
lllbert et al.ll2003 ). To fully constrain the calibration of the 
g-g lensing signal, we must understand the full photoz error 
distribution as a function of many parameters, particularly 
those relevant to galaxy-galaxy lensing, such as brightness, 
colour, environment, and of course redshift. Since the pho- 
toz error distributions will depend on a complex interplay 
between the widths and shapes of the filter functions, the 
set of filters used in the photoz estimates, the photometry 
error distributions, and the spectral energy distributions of 
the galaxies themselves, the photoz error distributions will 
not be symmetric or Gaussian in general, even if the pho- 
tometric errors in flux are Gaussian (the magnitude errors 
are not in any case, and some photoz methods use magni- 
tudes instead of fluxes). To be accurate, this photoz error 
distribution must be determined with a sample of galaxies 
with the same selection criteria (depth, colour, etc.) as the 
source sample. This is quite important because, as the pho- 
tometry gets noisier, the photoz error distribution can not 
just broaden, but can also develop asymmetry, tails, and 
other non-Gaussian properties. 
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So, cis for methods that use a statistical source redshift 
distribution, we once again must find a large spectroscopic 
sample with the same selection criteria as our source cat- 
alog. (Some photoz methods also require a training sam- 
ple with the same selection criteria as the source sample.) 
The completeness and rate of spectroscopic redshift failure 
are both potentially important, particularly if the spectro- 
scopic redshift failures all lie in a specific region of redshift 
or colour space. If a photoz method has a significant failure 
fraction, then we may be forced to eliminate a large frac- 
tion of the source sample, thus increasing statistical error 
significantly. Three major advantages of photoz's for lens- 
ing are that they (1) allow us to eliminate some fraction 
of the physically-associated lens-source pairs, thus reducing 
the effects of intrinsic alignments, (2) allow us to optimally 
weight each galajcy by the expected signal, and (3) allow us 
to reduce, if not eliminate, "sources" that are in the fore- 
ground from the sample entirely (a special case of optimal 
weighting) . 

We present a method to obtain robust, percent-level 
calibration of the g-g lensing signal using a sample of several 
thousand spectroscopic redshifts selected from the source 
sample (i.e., with the same selection criteria). The sources 
of spectroscopy we use to demonstrate this method are the 
DEEP2 and zCOSMOS surveys (described in section©. The 
use of two surveys in two areas of the sky carried out with 
two different telescopes is important, because (a) they do not 
have the same patterns of redshift failure, and (b) the large- 
scale structure in the two surveys is not correlated with each 
other, so effects of sampling (cosmic) variance are reduced 
for the combined sample. In addition, we use space-based 
data for the full COSMOS sample to quantify the efficacy 
of our star/galaxy separation scheme. 

We then use this method to analyze the redshift- 
related calibration bias of the lensing signal in previous 
g-g lensing analyse s that used our SP SS sourc e catalog 
jHirata et all |2004|: [Ma ndclba um et al.l [2005: Seliak et al. 
20051 : iMandelbaum et all l2006ci ibllal: iMandelbaum fc Seliak 
20071 ). Our calibration bias analysis is quite important, as 
our statistical error for some applications has dropped below 
5%, making our systematics requirements more stringent. 

More importantly, we take a broad view, testing not just 
the redshift determination methods that we have used in the 
past, but also several new ones that have been developed in 
the past few years, in order to determine which ones are 
most useful for lensing. In the process, we determine which 
common photoz failure modes and error distributions are 
most problematic for g-g lensing. The results of our analysis 
will be useful not only for SDSS g-g lensing, and the method 
we present is generally useful for future weak lensing analy- 
ses (and generalizable to scenarios without spectroscopy for 
lenses and to shear-shear autocorrelations), particularly as 
larger, deeper spectroscopic datasets are becoming available. 

In section [2l we describe the lensing source catalog and 
the spectroscopic redshift samples. Section [3] includes a de- 
scription of the source redshift determination algorithms 
that we will test in this work. In section |4l we describe our 
method for determining the source redshift-related calibra- 
tion bias, including handling complexities such as large-scale 
structure. We present the results of our analysis in section O 
and discuss the implications of these results in section |6l 



When computing angular diameter distances, we as- 
sume a fiat cosmology with f2m = 0.27 and f^A ~ 0.73. 



2 DATA 
2.1 SDSS 

The data used for the le nsing source catalog are obtained 
from the SDSS (|York et al. 2000), an ongoing survey to im- 
age roughly n steradians of the sky, and follow up approxi- 
ma tely one million of t h e detected objects sp e ctroscopically 
ffii senstein et al.l l200ll : [Richards et all |2002| : IStrauss et aU 
|200^ The imaging is carried out b^ drift-scannin 



the sky in pho tometric conditions (|Hogg et al.l 200 ll 
Ivezic et al.l l2004t) , in fi ve bands (ugriz) ( Fukugita et al.l 



19961 : ISmith et al.l I2OO2I) using a specially-designed wide- 



field camera l|Gunn et all 1 19981 ). These imaging data are 
used to create the source catalog that we use in this pa- 
per. In addition, objects are targeted for spectroscopy us- 
ing these data (jBlanton et al.l l2003bl ) and are observed 
with a 640 - fiber spectrograph on the same telescope 
iGunn et al.l I2OO6I ). All of these data are processed by 
completely automated pipelines that detect and mea- 
sure photometric properti es of objects, and astrometri- 
cally calibrate the dat a (jLupton et al.l I2OOII : iPier et ahl 
l2003l : iTucker et all [200^ '). The SPSS is weU unde rway, and 
has had seven major data releases (IStoughton et al. 2002; 



Abazaiian et al.l I200I [2 004, 2005: Fi nkbeiner et all [2004 



Adelman-McCarthv et al.,.2006. . ,2007ai iW 



The source sample we d escrib e was originally pre- 
sented in [Mandelbaum et al.[ l|2005h . hereinafter M05. It 
includes over 30 million galaxies from the SPSS imag- 
ing data with r-band model magnitude brighter than 
21.8. Shape measurements are obtained using the RE- 
GLENS pipeline , including PSF corr ection done via re- 
Gaussianization ([Hirata fc Seliak[|2003l ) and with selection 
criteria designed to avoid various shear calibration biases. A 
full description of this pipeline can be found in M05. 



2.2 DEEP2 



The DEEP2 Galaxy Redshift Survey dPavis et al. 2005 ; 
[Madgwick et al.l [20031 : [Coil et al.[ [2004[ : [Pavis et al.] [2005 ) 

consists of spectroscopic observation of four fields using 
the PEep Imagin g Muhi-Object Spectrograph (PEIMOS, 
[Paber et al] [20031 ) on the Keck Telescope. This paper uses 
data from field 1, the Extended Groth Strip (EGS), cen- 
tered at RA 14''17'", Pec. -1-52° 30' ( J2000) and with di- 
mensions 120' X 15' ([Pavis et al.ll2007l ). Galaxies brighter 
than Rab = 24.1 were observed in all four PEEP2 fields, 
but in the other three fields besides EGS, two colour cuts 
were made to exclude galaxies with redshifts below z ~ 0.7. 
The DEEP2 EGS sample, in contrast, includes objects of all 
colours with Rab < 24.1, although colour-selected z < 0.75 
objects with 21.5 < Rab < 24.1 receive slightly lower selec- 
tion weight. This is the sample from which a bright subset, 
r < 21.8, was extracted for this paper. The selection proba- 
bilities for all objects are well-known, allowing us to account 
for this deweighting directly, though this has little impact 
for this study, since only a small fraction of galaxies with 
useful SPSS shape measurements are fainter than R = 21.5, 
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and they have httle statistical weight due to their larger 
shape measurement errors. Due to saturation of the CFHT 
detectors used for target selection, no galaxies brighter than 
Rab ~ 17.6 were targeted; these galaxies constitute a very 
small fraction of our source sample. 

For this paper, we use all EGS data collected through 
the spri ng of 2005, a pare nt catalog of more than 13 000 
spectra l|Davis et al.l 120071 '). The 155 DEEP2 EGS objects 
with r < 21.8 (the limit of our source catalog) that failed to 
yield redshifts in initial DEEP2 analyses were reexamined in 
detail; after this effort, the net redshift success rate (defined 
as DEEP2 quality 3 or 4) was 96%, significantly higher than 
for the full EGS sample. The positions of the DEEP2 EGS 
matches in our source catalog are shown in the right panel of 
Fig. [T] There are ~ 1530 SDSS galaxies in this region with 
matches in DEEP2 at r < 21.8. Roughly 65% of those pass 
the lensing selection, leaving us with a sample of 1013. 

2.3 zCOSMOS 

The other redshi ft survey used for this work is zCOSMOS 
l|Lillv et al.l2007^ . which uses the Visible Muhi-Object Spec- 
trograph (VIMOS. iLeFevre et all 120031 ) on the 8-m Euro- 
pean Southern Observatory's Very Large Telescope (ESO 
VLT) to obtain spectra for galajcies in the COSMOS field, 
which is 1.7 deg^ centered at RA lO*", Dec. +2° 12' 21". We 
use data from the zCOSMOS-bright survey, which is purely 
flux-limited to Iab ~ 22.5, well beyond the flux-limit of our 
source catalog, and currently contains ~ 10* galaxies (Lilly 
et al., in prep.). Observations began in 2005 and will take 
at least three years to complete. 

One important benefit of the zCOSMOS data is that 
due to its locati on in the Cosmological Evolution Surv ey 
(COSMOS) field (ICaoak et al.ll2007l : IScoviUe et al.ll2007bl lal: 
iTaniguchi et al. I l2007l ). there is very deep broadband ob- 
serving data from a variety of telescopes in addition to 
a single passband observation from the Advanced Camera 
for Surveys (ACS) on the Hubble Space Telescope (HST). 
This photometry has been used to generate extremely high- 
quality photometric redshifts using the Zu rich Extragalac- 
tic B ayesian Redshift Analyzer (ZEBRA, iFeldmann et al.l 
I2OO6I ). which will be desc ribed further in Sectio nal and sev- 
eral other photoz codes l|Mobasher et al.ll2007l ). Using data 
with M*, B, V, g' , r' , i' , z' , and Ks photometry, the pho- 
tometric redshift accuracy for the bright, /-selected sample 
is remarkable, crAz/{i+z) < 0.03. This accuracy is achieved 
using 10% of the zCOSMOS sample as a training set. In 
cases of spectroscopic redshift failure, these nearly noiseless 
photoz's can be used instead. We will demonstrate explicitly 
that the effect on the estimated lensing redshift calibration 
bias of using their photoz's for redshift failures is within 
the statistical error. Consequently, the nominal 8% spectro- 
scopic redshift failure rate for zCOSMOS galaxies in our 
source catalog is effectively zero for our purposes. 

The HST imaging in the full COSMOS field was also 
used for another test because it enables star/galaxy separa- 
tion to be performed more accurately than in SDSS. Conse- 
quently, we use the full COSMOS galaxy sample to match 
against our source catalog and identify the stellar contami- 
nation fraction to high accuracy. 

The positions of the zCOSMOS matches in our source 
catalog is shown in the left panel of Fig. [1] We have spectra 
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Figure 1. Positions of the zCOSMOS (left) and DEEP2 (right) 
spectroscopic galaxies used in this work. 



in an area covering ~ 1.5 square degrees, 88% of the eventual 
area of the zCOSMOS survey. The sampling is denser in 
some regions than in others (and will eventually be filled 
out evenly in the full area) . In this region, there are ~ 3000 
SDSS galaxies with r < 21.8; roughly 65% pass our lensing 
selection cuts, leaving us with 1825 matches in the source 
catalog. 



3 REDSHIFT DETERMINATION 
ALGORITHMS 

Here we describe the source redshift determination algo- 
rithms in more detail. We begin with those used in our 
current lensing source catalog, for which we want to assess 
calibration biases in past works, then describe methods that 
have more recently become available. 



3.1 Previous methods 

In our catalog, which was created in 2004, we used three ap- 
proaches to source redshift determination, all described in 
detail in M05. For the r < 2 1 sources, we u sed photometric 
redshifts from kphotoz v3_2 (|Blanton et al. |[2003a) and their 
error distributions determined using a sample of 162 galax- 
ies in the DEEP2 EGS. We also required Zp > zi + 0.1 to 
avoid contamination from physically-associated lens-source 
pairs. For the r > 21 sources, we used a source redshift 
distribution from DEEP2 EGS (from fitting to 116 red- 
shifts), which means that we lack individual redshift esti- 
mates for each source. The sample of redshifts used for this 
early work with the EGS was a factor of 3.5 smaller than 
the EGS sample used for this work, or a factor of 10 smaller 
than the combined EGS -I- zCOSMOS sample used here. 
For the high-redshift LRG source sample (see selection cri- 
teria in M05), we used well-calibrated photometric redshifts 
and their error distributions determined using data from the 
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2 dF-SDSS LRG and Quasar S urvey (2SLAQ), as presented 
in IPadmanabhan et al.l l|2005l ). 



3.2 New options 

There are several relatively new photoz options for SDSS 
data, all of which have relatively low failure rates of ~ 5%. 
The first is available in the SDSS DR 5 (data releas e 5) sky- 
serve r "Photoz" table (jBudavari et a l. 2000; Cs abai et al.l 
I2OO3I ). The photoz's for this template method are deter- 
mined by fi tting observed galaxy colours to empirical tem- 
plates from IColeman et al.l (|l980h extended using spectral 
synthesis models. There is an additional step (not used for 
all template methods) in which the templates are iteratively 
adjusted using a training sample. We have performed our 
tests on both the DR5 and DR6 template photoz's, and 
found no significant differences in performance between the 
two. 

The second new option is available in the SDSS DR6 
skyserver in the "Photoz2" table. These photoz's were com- 
puted using a neu r al net (NN) algorithm similar to that of 
ICollister fc Lahavl (|2004l ) trained using a training set from 
many data sources combined: SDSS spectroscopic samples, 
2SLAQ, CFRS, CN0C2, DEEP, DEEP2, and GOODS-N. 
A more complete description of both NN pho t oz's i n the 
DR6 database can be found in lOvaizu et al.l (|2007| ): the 
"CC2" photoz's use colours and concentrations, while the 
"Dl" photoz's use magnitudes and concentrations. In the 
text, we will describe any d ifferen ce between the DR5 and 
DR6 results; lOvaizu et aP (|2007h recommends against us- 
ing the DR5 photoz's for science applications now that the 
improved DR6 versions exist. 

The third new option we test is the ZEBRA 
jPeldmann et all 120061 ) algorithm, which has already been 
successfully used with much deeper imaging data in the 
COSMOS field. This method involves template-fitting, but 
also takes a flux-limited sample of galaxies (without spec- 
troscopic redshifts) from the data source for which we want 
photoz's. These data are used to create a Bayesian modiflca- 
tion of the likelihoods based on the N{z) for the full sample 
l|Brodwin et al.|[2006l ) and on its template distribution. In 
practice, this prior helps avoid scatter to low redshifts. A 
key question we will address is how this algorithm behaves 
with the signiflcantly noisier SDSS photometry. To avoid 
confusion, we will refer to the high-quality ZEBRA photoz's 
derived using the deep photometry in the COSMOS field 
as "ZEBRA" photoz's, and the ZEBRA photoz's using the 
much shallower SDSS photometry as "ZEBRA/SDSS" pho- 
toz's. 

To be specific about the training method, to get the ZE- 
BRA/SDSS photoz's, half of a flux-limited sample of SDSS 
galaxies with zCOSMOS redshifts are used for template op- 
timization. This part of the analysis includes fixing the red- 
shifts of those galaxies to the spectroscopic redshift, finding 
the best-fitting t empla te, and optimizing it as described in 
iFeldmann et all (|2006l '). Then, a sample of 10^ SDSS galax- 
ies (flux-limited to r = 22) without spectra were used to 
iteratively compute the template-redshift prior. 



3.3 Effects of photoz error for lensing 

Finally, we clarify the effects of photoz error on the lensing 
calibration: 

• A positive photoz bias, deflned as a nonzero {zp — z), 
will lower the signal (because the critical surface density, 
deflned below in Eq. [2] will be underestimated). 

• A negative photoz bias will raise the signal. 

• Photoz scatter will usually lower the signal due to the 
shape of the critical surface density near zi. This effect can 
be very significant for sources at redshifts below ~ zi 4-1.5(7, 
where a is the size of the scatter. 

The last point is very important for a shallow survey 
like SDSS when the lens redshift is above zi ~ 0.1, because 
of the large number of sources within a few a of the lens 
redshift. For a deeper survey such as the Canada- France- 
Hawaii Telescope Legacy Survey (CFHTLS), with lenses and 
sources separated by Az ~ 0.5 on average this effect may in 
fact be negligible. The effects of photoz bias are important 
not just in the mean, but as a function of redshift. If low 
redshift sources have nonzero photoz bias, and high redshift 
sources have nonzero photoz bias in the opposite direction, 
so that the mean photoz bias for the full sample is zero, the 
effect of the opposing photoz biases on lensing calibration 
will not, in general, cancel out since the effect on lensing 
calibration tends to be more significant for the sources that 
are closer to the lenses. 

Catastrophic photoz errors are those that are well be- 
yond the typical scatter, typically occurring due to some 
systematic error, colour-redshift degeneracy, or other prob- 
lem (and by definition, these photoz's are not flagged as 
problematic by the algorithm, so they can only be identifled 
using a spectroscopic sample with similar selection to the 
target sample). The catastrophic error rate may be impor- 
tant, depending on the type of catastrophic error. For exam- 
ple, sending a few percent of the sources to Zp — will not 
lead to calibration bias, it will simply lead to that fraction 
of the sources not being included because they have Zp < zi, 
causing a percent-level increase in the flnal error. In short, 
it is clear that the three metrics often used to quantify the 
accuracy of photoz methods - the mean bias, scatter, and 
catastrophic failure rate - are not sufficient to quantify the 
efficacy of a photoz method for lensing. In this paper, we will 
introduce a metric that is optimized towards understanding 
the effects of photoz's on galaxy-galaxy lensing calibration, 
and present results for the photoz mean bias, scatter, and 
catastrophic failure rate only as a means of understanding 
the results for our lensing-optimized metric. For other sci- 
ence applications, the optimal metric may be quite different 
from what we present here. 



4 METHODOLOGY 
4.1 Theory 

Galaxy-galaxy lensing measures the tangential shear distor- 
tions in the shapes of background galaxies induced by the 
ma ss distribution around foregrou nd galaxies (for a review, 
see lBartelmann fc Schneideill200ll ). The result is a measure- 
ment of the shear-galaxy cross-correlation as a function of 
relative foreground-background separation on the sky. We 



© 0000 RAS, MNRAS 000, 000-000 



6 Mandelhaum et al. 



will assume that the redshift of the foreground galaxy is 
known, so we express the relative separation in terms of 
transverse comoving scale R. One can relate the shear distor- 
tion 7t to AE(i?) = E(< R) - E(i?), where T,{R) is the sur- 
face mass density at the transverse separation R and E(< 7?) 
its mean within R, via 



It 



AE(i?) 



Here we use the critical mass surface density, 



Ec = 



Ds 



AnG {1 + zl)^DlDls' 



(1) 



(2) 



where Dl and Ds are angular diameter distances to the lens 
and source, -Dls is the angular diameter distance between 
the lens and source, and the factor of (1 -f zl)^^ arises due 
to our use of comoving coordinates. For a given lens redshift, 
E~^ rises from zero at — z^ to an asymptotic value at 
Zs ^ zl; that asymptotic value is an increasing function of 
lens redshift. 

In this work, we focus on calibration bias in AE due to 
bias in Ec arising from source redshift uncertainty. 



4.2 Redshift calibration bias determination 

Here, we present a method for testing the accuracy of source 
redshift determination that is optimized towards g-g lensing. 
Formally, we wish to calculate the differential surface density 
AE using our estimator AE, which is defined as a weighted 
sum over lens-source pairs j, 



AE = ' — — 



(3) 



To isolate the dependence of calibration on redshift-related 
quantities, we will assume that the estimated tangential 
shear, yt, is unbiased. Ec,j (derived from our source red- 
shift estimator) is the critical surface density estimated for 
a given lens-source pair j. The weights for each lens-source 
pair are determined using redshift information as well: 



(4) 



E2 . 



where Crms is the rms ellipticity per component for the source 
sample (shape noise), and is the ellipticity measurement 
error per component. 

We want to relate our estimated AE to the true AE. To 
do so, we use the relation between the measured shear and 
AE, Eq. ([T|. Putting equation [1] into equation |3] (assuming 
{7) =7)1 define the redshift calibration bias via 



&z + l = 



AE 
AE 



(5) 



a weighted sum of the ratio of the estimated to the true 
critical surface density. 

This expression must be computed as a function of lens 
redshift. In the limit that the sources are at much higher 
redshift than the lenses, Ec does not depend as strongly on 
the source redshift, so (for a given photometric redshift bias) 
\bz\ will be smaller than if the lens redshift is just below the 
source redshift. For a lens sample with redshift distribution 



p{zt), the average calibration bias (bz) can be computed as 
a weighted average over the redshift distribution, 

/ dzip(zi)wi{zi)bz{zi) 



J dzip{zi) wi{zi) 



(6) 



where the redshift-dependent lens weight wi{zi) is defined 
as the total weight derived from all sources that contribute 
to the lensing signal for a given lens redshift, Wj. 

In the ideal case, we would do this calculation with a 
large, complete spectroscopic sample drawn at random from 
our source sample, sparsely sampled on the sky and therefore 
lacking features in the redshift distribution due to large-scale 
structure. We can then find bz{zi) on a grid of lens redshifts 
by forming the sums in equation [S] using all sources with 
spectra. Finally, we can use the total weight as a function of 
lens redshift and the lens redshift distribution to estimate 
the average redshift bias of the lensing signal. 

To get the errors on the bias in this simple scenario, we 
can simply bootstrap resample our sample of source galaxies 
with spectroscopy. For a sample of A'gai galaxies, bootstrap 
resampling requires us to make many "new" galaxy samples 
consisting of A'gai galaxies drawn from the original sample 
with replacement. Assuming that the observed galaxy red- 
shifts accurately refiect the underlying redshift distribution, 
and the redshifts are uncorrelated, the mean best-fit red- 
shift distribution will refiect the true one, and the errors 
in the redshift calibration bias can be determined from the 
variance of the calibration biases for each bootstrap resam- 
pled dataset. Since the bootstrap depends on the assumption 
that the objects we are bootstrapping are independent, this 
method only gives proper errors in the case where LSS is 
unimportant. 

In general, there are several problems that mean we 
are no longer dealing with the ideal case. The first problem 
is sampling variance, since most redshift surveys are com- 
pleted in a well-defined, small region of the sky. The second 
is the fact that most redshift surveys suffer from some in- 
completeness, and that incompleteness may be a function of 
apparent magnitude or colour, which means that the loss of 
those redshifts can make the spectroscopic sample no longer 
comparable to the full source sample. We attempt to amelio- 
rate these problems by using two sources of spectroscopy on 
different areas of the sky and with different spectrographs 
and analysis pipelines, so that the LSS and incompleteness 
tendencies in each sample are different. Below, we address 
these deviations from the ideal case in more detail. 



4.3 Effects of sampling variance 

Large-scale structure can be problematic when using surveys 
on small regions of the sky to determine bias in the lensing 
signal due to photometric redshift error. The LSS may em- 
phasize particular regions of the source redshift distribution 
that have unusual features in the photometric redshift er- 
rors. To avoid this problem, we would like to fit for a redshift 
distribution in a way that accounts properly for uncertain- 
ties due to sampling variance. There are many approaches 
to this problem in the literature, such as that demonstrated 
in lBrodwin et al.1 ifiood ). 

The simplest way around our aforementioned problem, 
that LSS causes the redshifts to be correlated so that the as- 
sumption behind the bootstrap is violated, is to bootstrap 
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the bins in the redshift histogram instead. In the limit that 
the bins are significantly wider than the typical sample cor- 
relation length, the correlations within the bins will be far 
more important than the correlations between adjacent bins. 
Thus, the requirement that the bootstrapped data points be 
independent is much closer to being fulfilled. Here, we will 
use redshift bins with size Az — 0.05, where each bin is 
considered as a pair of points {zi, N{zi)). In a given boot- 
strapped histogram, some redshift bins {zi,N{zi)) will be 
included multiple times, others not at all, but each time a 
given bin is used, it has the same number of galaxies as in 
the real data. While this method is simplistic, it has the ad- 
vantage of not requiring us to understand the details of the 
sample selection, since the lensing selection is a very non- 
trivial cut to understand and simulate. The resulting errors 
on the best-fit N{z) from this bootstrap will include the 
effects of both Poisson error (which is non-negligible given 
the size of the samples used) and large-scale structure. The 
errors are valid assuming that there are no correlations be- 
tween the 150/i~^Mpc-wide bins. We discuss this assump- 
tion, which depends not just on straightforward integration 
of the matter power spectrum but also redshift-space dis- 
tortions, galaxy bias, and magnification bias, further in sec- 
tion EZ] 

For each bootstrapped histogram with bins centered at 
Zi containing Ni galaxies each, we minimize the function 



(7) 



via summation over redshift bins i. jv^(™°'*'=') jg the number of 
galaxies predicted to lie in bin i given the model for dN/dz, 
i.e. 



(model) 



-Az/2 



dz 



■dz. 



(8) 



For each bootstrapped histogram, we also imposed a nor- 
malization condition on the fit that d2(dAf<™°'^°'Vdz) = 
Ngai (the total number of galaxies in the spectroscopic 
sample). In the case of Poisson error, the natural choice 
for ui'^^ is i/jv]-'""'^^'' . However, in the presence of LSS, 
which contributes significantly to the variance in each bin, 
the distribution of values in each bin is, in fact, unknown, 
so the optimal weighting scheme is unclear. Consequently, 
we use the simplest possible weighting scheme, w''^' = 1 
for all i. We have, however, confirmed that if we do use 
wl'^^ = i/jv|™°'*°''', then the changes in the best-fit redshift 
distribution parameters, and the implied changes in redshift 
calibration bias, are well below the la level. 

Our 2-parameter model for the redshift distribution is 



d7V 
d7 



exp [-0.5{z/z,f] 



which has mean redshift 



V2z,r[{a+l)/2] 
r(a/2) 



(9) 



(10) 



This choice is based purely on the empirical observation that 
it describes the shape of the redshift distribution better than 
the many other functional forms that we tried, and addition 
of extra parameters did not significantly improve the best-fit 
A'^. In particular, allowing the power-law inside the expo- 
nent to vary from 2 (a common choice) did not lead to any 



significant change to the best-fit redshift distribution below 
z = 0.8, where the vast majority of the galaxies are located. 
The changes above that redshift are marginally statistically 
significant, but there are so few sources above that redshift 
that our final results for the redshift bias that we eventually 
want to calculate do not change within the statistical error. 

We will present best-fit redshift distributions for zCOS- 
MOS and DEEP2 EGS separately to demonstrate that the 
results are consistent within the errors. We then use both 
samples combined to create an overall redshift distribution. 

This distribution is crucial to our scheme to avoid sam- 
pling variance effects in the determination of the redshift 
calibration bias. To counterbalance regions of source red- 
shift space that are over- or under-represented in our spec- 
troscopic sample due to LSS fluctuations, we incorporate an 
additional weight into the calculation of the redshift bias in 
Eq. (O. For a galaxy in redshift bin i in our histogram, the 
LSS weight (wlss) is the ratio of the number of galaxies pre- 
dicted to lie in bin i from our best-fit redshift distribution, to 
the number actually found in that bin {N^'^°'^°^^ / Ni) . Thus, 
those regions in redshift space with too many/few galaxies 
due to LSS or Poisson fiuctuations will be down/up- weighted 
appropriately. We can then get errors on the average red- 
shift bias using the best-fit redshift histograms for each 
bootstrap resampled histogram to derive the LSS weights. 
This procedure incorporates uncertainty in the source red- 
shift distribution appropriately, since we never need to boot- 
strap the galaxies themselves. 

In an analysis containing many patches of sky, the size 
of the errors can be verified by comparing the redshift bias 
computed in each patch of sky. Unfortunately, with only two 
patches of sky, this method is not an option for this work. 



4.4 Redshift incompleteness and failures 

For precision results, we require a high redshift completeness 
and quality. There are several tests that we can carry out to 
ensure that the sample is of high quality. We consider the 
redshift failures separately for the DEEP2 and zCOSMOS 
samples. In both cases, we will determine the magnitude and 
colour distribution of the failures relative to the full sample, 
to see if a particular region of redshift space is causing the 
problems. 

For zCOSMOS, there are high-quality photoz's derived 
from very deep photometry which we can use in the case 
of spectroscopic redshift failure. To control for any effect on 
the computed redshift calibration bias, we also check the 
results using the zCOSMOS photoz's for a larger portion of 
the full sample, to ensure that noise in these photoz's has a 
negligible effect on the results. 

For DEEP2 EGS, we lack redshift estimates for the fail- 
ures. To place a very conservative bound on the effect of 
failures on the estimated calibration bias, we estimate the 
redshift bias with all the failures forced to 2 = 0, and then 
to z — 1.5. For both surveys, we will compare the ranges of 
colours and redshifts spanned by the successes and failures, 
to ensure that our procedures for handling redshift failure 
are justified. 

The next issue is the quality of the non-failed redshifts, 
which in DEEP2 are assessed by visual inspection and repeat 
observations, and in zCOSMOS using the photoz's as well. 
For DEEP2, we have used only Q = 3 and (3 = 4 redshifts. 
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which are 96% of our sample, and are estimated to be 95% 
and 99.5% reliable. For zCOSMOS, the reliabilities for Q = 
3 and 4 objects (92% of our sample) are > 99%. For this 
survey we also use Q = 2.5, those with slightly lower quality 
in principle but with extremely good matches between the 
spectroscopic and photometric redshift, and Q = 9.5 (single- 
line redshifts with good matches between the spectroscopic 
and photometric redshifts, which in this apparent magnitude 
and redshift range are usually from Ha), both of which also 
are > 99% reliable as determined from repeat observations. 

In the DEEP2 EGS, there are also minor selection ef- 
fects to control for. The first effect is the fact that no galaxies 
brighter than r ~ 18.5 were targeted. Galaxies brighter than 
that limit constitute only 4% of the source sample, but we 
nonetheless include tests of the effect this has on the result. 

The other selection effect in DEEP2 EGS occurs at 
magnitudes fainter than R = 21.5, where z < 0.75 ob- 
jects are given slightly lower selection weights than higher-z 
galaxies. While the fraction of source galaxies fainter than 
this magnitude is only ~ 12%, we use their selection prob- 
abilities psci to properly compensate for this effect. To be 
explicit, the total weight for each source is thus a product of 
lensing weight Wj, the LSS weight uilss, and max(psci)/psci,j 
(or 1 for the zCOSMOS galaxies). 

Finally, we clarify our statement that our method re- 
quires the spectroscopic sample used to evaluate photoz's 
to be comparable to the source sample. As demonstrated 
above, it is possible to use weights to account for well-defined 
targeting priorities that might make the spectroscopic sam- 
ple slightly non-representative of the source catalog. Thus, 
our statement that we require the spectroscopic sample to 
be comparable to the source sample is really a statement 
that it must contain all galaxy types (spectral types, magni- 
tudes, etc.) in the source sample with representation levels 
that are sufficient to overcome the noise. If some reweight- 
ing is necessary to account for under- or over-representation 
of a given population, then for our purposes, this is suffi- 
cient to fulfill our requirements. Thus, one could not use 
a spectroscopic sample with a strict cutoff two magnitudes 
brighter than the flux limit of the source catalog. One could 
use a spectroscopic sample that has a lower redshift success 
rate for fainter galaxies, as long as that lower success rate 
is due to statistical error, so that the failures have the same 
redshift distribution as the successes, rather than some sys- 
tematic error (e.g. inability to determine redshifts for any 
object of a particular spectral type above some cutoff red- 
shift). Reweighting schemes to account for different fractions 
of various galaxy populations in the training and photomet- 
ric samples are being successfuU used by the SDSS neural 
net photoz group to predict redshift distributions and pho- 
toz error distributions in the photometric samples Q 

4.5 Direct use of photoz's 

Here, we explain our use of photoz's directly for Ec estima- 
tion. One might argue that since we have a spectroscopic 
sample, we should estimate Ec using a deconvolved photoz 
error distribution. However, in this paper we test the use of 
photoz's directly, for several reasons. 

^ Lima, Cunha, Oyaizu, Lin, Frieman, 2007, in prep. 



First, as we have argued previously, a key advantage of 
using photoz's is that we can eliminate intrinsically-aligned 
sources. Once we start eliminating sources from the sample 
on the basis of detailed cuts on photoz, colour, or apparent 
magnitude, we would have to re-estimate the photoz error 
distribution for the sample that passes these cuts and redo 
the deconvolution procedure. This is computationally expen- 
sive and potentially difficult to do robustly, if the cuts result 
in our photoz error distribution being poorly-determined 
due to insufficient spectroscopic galaxies that pass the cuts 
to properly sample the distribution. We would therefore like 
to find a photoz method that can lead to accurate lensing 
calibration on its own. 

There is, in principle, one simple option that might im- 
prove the lensing calibration and that can be done without 
full deconvolution: we can correct each photoz for the mean 
photoz bias. To be accurate, this should be done as a func- 
tion of galaxy colour and magnitude. We will test the results 
of doing so for one of the photoz methods when we present 
the results of our analysis. 

The final reason to use photoz's directly is because that 
is the approach taken in many lensing papers to date, and 
we would like to test the accuracy of what is currently done 
in the field to see what improvements need to be made. 
In section 15.91 we will consider using a full p{z) as a new 
alternative approach to using the photoz alone. 



5 RESULTS: APPLICATION TO SDSS 
LENSING 

5.1 Matching results 

There are 1013 and 1825 galaxies in our source catalog 
with spectra from DEEP2 EGS and zCOSMOS, respec- 
tively (including redshift failures). We now characterize 
these matches relative to the entire source catalog and com- 
pared to each other. 

Figure [2] shows the redshift histograms for matches be- 
tween the source catalog and the zCOSMOS and DEEP2 
samples. The zCOSMOS histogram is shown both with and 
without precision photometric redshifts for the redshift fail- 
ures, whereas for DEEP2, the failures (4%) were excluded 
entirely. As shown, there is significant large-scale structure 
in the redshift histograms, but not correlated between the 
two samples. Visually, the redshift histogram for DEEP2 ap- 
pears to be at slightly higher redshift on average. We assess 
the statistical significance of any differences below. 

Figure|3]shows the distribution of apparent r-band mag- 
nitude p{r) for the zCOSMOS and DEEP2 matches relative 
to that of the entire source catalog, Pref (f). The apparent 
magnitude histogram for zCOSMOS is quite similar to that 
for the full source catalog (within the noise) , and the failures 
are predominantly at the faint end. The apparent magnitude 
histogram for DEEP2 shows the deficit at r < 18.5 (4% of 
the sample) due to targeting constraints. 

Of the matches, 151 of those in zCOSMOS (8%) and 38 
of those in DEEP2 (4%) are redshift failures (where failures 
are defined as having redshift success rates below 99%). In 
Fig. 131 we show the distributions of various quantities for 
the zCOSMOS and DEEP2 failures as compared with the 
full sample. Fig. [3] shows the relation of the failures to the 
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Figure 2. Redshift histogram for the matches between the source 
catalog and the spectroscopic samples. 
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Figure 3. Bottom: r-band apparent magnitude histogram for the 
full source catalog. Top: Difference between the apparent magni- 
tude histogram for the zCOSMOS and DEEP2 samples relative 
to that for the full source catalog. 

general sample as a function of apparent magnitude; the top 
part of Fig. |4] shows that the colour distribution for the fail- 
ures is similar to the colour distribution for the successes. 
We thus have no reason to believe the failures lie in a partic- 
ular region of redshift space. The DEEP2 failures lie in the 
< z < 0.75 colour locus, just like the majority of the suc- 
cesses in this bright subsample of the EGS data. (This is not 
true for deeper redshift samples, such as the other DEEP2 
fields, where failures typically occur for blue, z > 1.5 galax- 
ies. The flux and apparent size cuts imposed on our sam- 
ple essentially remove any such galaxies.) Inspection of the 
38 DEEP2 spectra suggests that the redshift distribution is 



Figure 4. Colour-magnitude scatter plots for redshift successes 
and failures in zCOSMOS (top) and DEEP2 (middle). Successes 
are shown as black points and failures as blue hexagons. The 
bottom panel shows the zCOSMOS photoz error as a function of 
redshift for the redshift successes, including the 68% CL errors as 
a function of redshift (red lines). 



similar to that for the successes, with failures due to bad 
astrometry, a bad column running through the spectrum, or 
similar failures that do not correlate with redshift. We also 
show the zCOSMOS photoz error distribution as a function 
of redshift in the bottom of Fig. |4] for spectroscopic redshift 
successes. The photoz errors for this sample are indeed as 
small as, or even smaller than, those presented elsewhere for 
these photoz's (Feldm ann et al. 200^). We may view this er- 
ror as a "systematic floor" to the error, with the increase in 
error for the ZEBRA/SDSS photoz's being ascribed to the 
much noisier photometry. We will see that this statistical 
error dominates the error budget. 

Next, we present redshift distributions for each survey 
separately, with two purposes: (1) to demonstrate that they 
are consistent with being drawn from the same underlying 
redshift distribution, and (2) to determine the weights to 
compensate for sampling variance as described in section l4^ 

Fig. [5] shows the observed and best-fit redshift his- 
tograms for zCOSMOS, DEEP2, and both surveys com- 
bined. Table [T] shows the corresponding best-fit parameters 
from Eq. ([9j. The weighting to account for the DEEP2 se- 
lection at R > 21.5 causes a negligible change in the results. 
By bootstrapping the redshift histogram as described in sec- 
tion |4]3] we have determined the median predicted number 
of galaxies in each bins, and the 68% confidence limits on 
that number, as shown on the plot. Because we have imposed 
a normalization condition on the fit, the errorbars are cor- 
related between various parts of the histogram. We can see 
from the plot and table[T]that while the DEEP2 sample is at 
slightly higher redshift on average, the redshift distributions 
from zCOSMOS and DEEP2 are consistent with each other 
within the (Poisson plus LSS) errors. While it is difficult to 
compare the curves for z > 0.7, where the number of galax- 
ies has declined sharply, we can compare the total fraction 
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Table 1. Parameters of fits to redshift distribution from Eq. [5] 



Sample 




2, 






a 






zCOSMOS 


0.259 


± 


0.040 


2.58 


± 


0.58 


0.369 ±0.018 


DEEP2 EGS 


0.300 


± 


0.041 


2.35 


± 


0.41 


0.408 ± 0.025 


Both 


0.275 


± 


0.025 


2.42 


± 


0.36 


0.382 ±0.012 



of the sample with z > 0.7 to show that they are consis- 
tent: for DEEP2 EGS, this fraction lies between [0.05,0.12] 
at the 68% CL; for zCOSMOS, between [0.02,0.08]. These 
limits were determined using the fraction above z > 0.7 
for the best-fit N{z) for 200 bootstrap-resampled redshift 
histograms, and therefore include both Poisson error and 
sampling variance. It is clear that any discrepancy between 
the best-fit zCOSMOS and DEEP2 redshift histograms with 
respect to the fraction of the sample above z > 0.7 are not 
significant at the 68% CL. 

As shown in the lower left panel of Fig.[S] there is no sys- 
tematic tendency for the observed and best-fit Ni {z) for the 
full sample to deviate from each other, only Poisson and LSS 
fluctuations, so the form we have chosen for dN/dz is accept- 
able. (The fluctuations are quite large for z > 1 because the 
best-fit jv.'™"'*'^'-' drops below 1, so discreteness will cause the 
ratio of iVi/7Vf'"°'*'='' to be either zero or some large number.) 
It is important to note that this plot is the unweighted red- 
shift distribution; inclusion of the lensing weights in Eq. ([IJ 
will change the effective source redshift distribution. 



5.2 Photoz error distributions 

As a way of understanding the trends in our lensing- 
optimized photoz error statistic hz, we first examine the 
photoz error distribution as a function of redshift. Figure [S] 
shows the photoz error as a function of the (true) redshift for 
the lensing-selected galaxies from zCOSMOS and DEEP2 
for the photoz algorithms tested in this work. The galaxies 
are divided by apparent magnitude into three samples with 
r < 20, 20 «; r < 21, and r > 21, and we show the 68% CL 
errors determined in bins of size Az = 0.05 for each apparent 
magnitude bin. For all methods, the error distributions tend 
to be highly non-Gaussian, often skewed and with significant 
tails. While the requirement that Zp > makes skewness in- 
evitable at low z even for a well-behaved photoz estimator, 
the effect persists to such high redshift for all methods that 
this constraint is clearly not the cause. Thus, the 68% confi- 
dence limits as a function of redshift are more useful than a 
calculation of the average photoz bias and scatter. Nonethe- 
less, we do tabulate the mean bias {zp — z) and the overall 
scatter (j{zp) in Table [2] for each method, for the full sam- 
ple and the r < 21 subset (to facilitate comparison between 
kphotoz, used only for r < 21, and the other methods). 

For the kphotoz method, there is a clear tendency to 
fail towards very low redshift, as demonstrated by the peak 
in p{zp) for Zp < 0.05. For lensing, such failures will be 
flagged as being below the lens redshift for nearly all relevant 
lens redshifts, thus excluding them from the source sample. 
Consequently, the only effect of this failure mode is to reduce 
the number of available sources, not to bias the weak lensing 
results. However, it is apparent that this method is as noisy 
for r < 21 as the other photoz algorithms are for r < 21.8, 



Table 2. Mean properties of the photoz algorithms, for the full 
sample and for r < 21 only in parenthesis. 



Method 


Mean bias 


Scatter 


kphotoz 


(-0.015) 


(0.14) 


Template 


-0.064 (-0.043) 


0.16 (0.12) 


NN/CC2 


0.034 (0.013) 


0.14 (0.11) 


NN/Dl 


0.038 (0.020) 


0.13 (0.10) 


ZEBRA/SDSS 


-0.014 (0.012) 


0.15 (0.12) 



and that the photoz error tends to be positive for z < 0.4 
and negative above that. 

For the template-based database photoz's, there is an 
even stronger failure mode towards Zp = than for kphotoz 
(because the template method goes fainter than the kphotoz 
sample). This failure mode contributes to the significantly 
negative 68% CL limits on the photoz error, since the points 
suggest that ignoring these failures leads to a more symmet- 
ric error distribution. We must quantify the effect this has 
in reducing the total weight; even if the bias in the lensing 
signal due to the strong failure mode is small, the increased 
statistical error due to loss of sources may be problematic. 
This failure mode is the cause of the large mean photoz bias 
in Table 

For the neural network algorithm, the plot shows the 
CC2 (colour- and concentration-based) photoz's, but the 
trends are qualitatively similar for the Dl (magnitude- and 
concentration-based) photoz's. There are entries for both 
versions in Tabled As shown, the method has a reasonably 
small overall scatter and no major failure modes. We caution 
the reader that the same is not true for the NN photoz's in 
the DR5 database, for which there is a significant scatter to 
redshifts 0.75 < Zp < 1 that more than doubles the number 
of sources estimated to be in this redshift range. The scat- 
ter is also larger for the DR5 NN photoz's. In both the DR5 
and the DR6 versions, there is a tendency towards positive 
photoz bias at low-intermediate redshifts {0 < z < 0.4) that 
may bias the lensing signal low. 

Finally, the ZEBRA/SDSS method also lacks a major 
catastrophic failure mode and has reasonably small over- 
all photoz bias. The redshift histograms derived from the 
spectroscopic and photometric redshifts agree remarkably 
well. As for the NN/CC2 photoz's, there is a trend towards 
positive photoz error at low redshift and negative error at 
high redshift. Because of the overall lower number of sources 
above z > 0.4, and the decreased dependence of Ec on source 
redshift at higher redshift, we have no reason to believe that 
the effects of the different direction of the calibration biases 
in the lensing signal will cancel out. We can also conclude, 
in comparison with the ZEBRA photoz errors in the lower 
panel of Fig. |4] (using the far deeper COSMOS photome- 
try) for the same exact set of sources, that for the redshifts 
and magnitudes dominated by this source sample, statis- 
tical error due to noisy SDSS photometry dominates over 
systematic error in this photoz method. 

5.3 Redshift bias 

In Figure [3 we show the lensing calibration bias bz{zi) for 
different source redshift determination methods, using the 
full lensing-selected spectroscopic redshift sample. The bot- 
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Figure 5. Top: Rescaled redshift histograms for the matches between the source catalog and the zCOSMOS (left) and DEEP2 (right) 
sample with best-fit histograms. The black histogram is the observed data, the smooth red curve is the best-fit histogram, the dashed 
magenta lines are the ztlfr errors, and the dotted blue line is the best-fit redshift histogram for the other survey. Bottom right: Same as 
above, for combined sample, with the dotted blue lines showing the results for each survey separately. Bottom left: ratio of observed to 
best-fit N(z) for the combined sample. 



torn panel shows the total lensing weight ascribed to the 
source sample for that lens redshift, determined via summa- 
tion over the lensing weights described in Section [4.41 Note 
that the r < 21 and LRG samples use photoz's with the 
requirement that Zp > zi + 0.1, to reduce contamination by 
physically-associated sources (for consistency with our pre- 
vious analyses). However, for the new photoz methods, we 
have not imposed any such condition (we will revisit this 
choice later). 

As shown, the r < 21 sample with photoz's from kpho- 
toz has a significant negative calibration bias that increases 
with lens redshift to —35% at zi = 0.35. As for all meth- 
ods, the bias worsens with lens redshift because, for a given 
source with some photoz error, a higher lens redshift leads 
to a higher relative error in E^^. The r > 21 sample (using 
dN/dz from DEEP2 EGS) has a small positive bias that 



increases to 10% at zi — 0.35. We assess the significance of 
these biases for our previous work in section [5^ The results 
for the LRG source sample confirm our assertion in previ- 
ous works that for zi < 0.3, this sample is essentially free of 
redshift bias. 

The lack of significant redshift calibration bias for the 
template photoz code for zi < 0.25 can be explained by 
the trends in Fig. |S] the calibration bias due to the slight 
negative photoz bias balances out the calibration bias due 
to photoz scatter. Even at higher redshift, the redshift cal- 
ibation bias, while nonzero, is less significant than for the 
other photoz methods. The neural net and ZEBRA/SDSS 
photoz's, however, have significant negative bias (—30% to 
—20%, at zi = 0.4), presumably because of the aforemen- 
tioned tendency to positive photoz bias for Zs < 0.4. This 
difference between the three methods is also the reason why 
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Figure 6. For each photoz method described in the text (in columns labeled according to the method), the top row shows the redshift 
histogram determined using the photoz (thin black line) and using the spectroscopic redshift (thick red line). The spectroscopic redshift 
histograms are not quite identical for all methods because we exclude photoz failures for each method and because kphotoz was only 
used for those galaxies with r < 21. The lower three panels show photometric redshift errors, for galaxies divided by apparent magnitude: 
r < 20 in the second row, 20 ^ r < 21 in the third row, and r ^ 21 in the fourth row. The points correspond to individual galaxies in 
the source catalog with spectra; the 68% confidence limits on the photoz error are shown as red solid lines. There are also green dashed 
lines indicating zero error and the lower limit on the error given that the photoz must exceed zero. 



the latter two methods have high total weight for the range 
of lens redshift considered here, whereas the template pho- 
toz code has lower weight (a) because of its scatter to low 
photoz (which eliminates possible sources from the sample) 
and (b) because it does not tend to scatter sources to higher 
photoz, which increases the weight artifically at the expense 
of biasing the signal. We emphasize that this higher weight 
for the two photoz methods does not mean that the error 
on AE is lower with these methods, because it may be due 
purely to the overestimate of In section 15.81 we will 

address the effect of using photoz 's on the statistical error 
in AE. 

Given that kphotoz has a similarly sized photoz error 



(r < 21 only) as the other photoz methods for the full source 
sample (all magnitudes), it is important to understand why 
the lensing calibration bias is so much worse for this method. 
The reason this occurs is that the r < 21 sample is at lower 
mean redshift. Since those sources are closer on average to 
the lens redshift, the same size photoz error translates to a 
larger error in Ec. 

To understand the results, we consider fixed lens red- 
shift of zi — 0.2, and show the redshift bias as a function of 
true source redshift for each method in Fig. |8] (again, with 
lensing weight as a function of source redshift as in Sec- 
tion l4.4p . Clearly, all source redshift bins with Za < 0.2 must 
give bz = —1, because the sources are not lensed. Above 
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Lens z = 0.3 




Figure 7. Redshift bias 62(2:;) (top) and weight (bottom, arbi- 
trary units) for many methods of source redshift determination 
as described in the text. To make the plot simpler to read, we 
have left off errorbars except for in one case, the ZEBRA/SDSS 
method, which is shown with an errorbar in one direction to indi- 
cate the typical size of the uncertainty in bz{zi) for all the meth- 
ods. 



Zs — zi = 0.2, the calibration bias is no longer identically 
zero, but may be significantly negative due to scatter in 
the estimates of source redshift (near Zs — zi, the deriva- 
tive dSc/dzs is large so photoz errors are very important). 
As the source redshift increases, the same photoz error be- 
comes less important because that derivative decreases, so 
the calibration bias approaches zero. The other important 
quantity to consider is the weight in each source redshift 
bin; if those source redshift bins with significant bias are 
given little weight, then the bias does not matter. If there is 
no weight for Zs < 0.2 that means that none of the galax- 
ies with true Zs < 0.2 have had photoz misestimated to 
be above that. This plot makes it clear that part of the 
reason for the significant bias for the NN, kphotoz and ZE- 
BRA/SDSS photoz's is that they give too much weight to 
Zs < 0.3. This is less of a problem for the template photoz's, 
so the calibration bias for this method is much less. 

Finally, we show the resulting mean calibration bias 
when these results are averaged over a lens redshift dis- 
tribution using Eq. ((6)1. Errors are determined using the 
prescription in section 14.31 The lens redshift distribu- 
tions that we consider are as follows: "sml"-"sm7" are 
the r edsh ift distributions for th e seven stellar mass bins 
ft-om iMandclbaum et all (|2006d ): "LRG" is the redshift 
distribution for the spectrosc opic LRGs, a volun ie -limite d 
sample, used for lensing in iMandelbaum et al.l (|2006lj ): 
and "maxBCG" i s the redshift distrib ution of the SDSS 
maxBCG clusters (|Koester et al .''2007b"a). These nine lens 
redshift distributions are plotted in figure[9l The stellar mass 
subsamples correspond roughly to luminosity samples with 
r-band luminosities of 0.33, 0.53, 0.72, 1.1, 1.8, 3.0, and 
4.71/,. The LRGs are red galaxies with typical luminosities 



Figure 8. Redshift bias bz{zs) (top) and weight (bottom, ar- 
bitrary units) for fixed zi = 0.2 with many methods of source 
redshift determination as described in the text. Errorbars are not 
shown here to make the plot simpler to read. 



of a few L, , and the maxBCG clusters are clusters selected 
from imaging data with masses > 5 x 10^^ h~ ^ Mq. 

The average redshift calibration biases {b^} (defined in 
Eq.|6} for the redshift determination methods given in Fig. [7] 
for these nine lens redshift distributions are shown in Ta- 
ble O As shown, for the stellar mass subsamples, the bias 
gets more significant at higher stellar mass because of the 
higher mean redshift. The maxBCG sample gives similar 
bias to sm7 because of the similar redshift range, and the 
LRG sample gives the worst bias because it has the highest 
mean redshift. The only method for which the trend is dif- 
ferent is the template photoz code, for which the trend of 
bz{zi) changes sign with redshift due to the different trends 
of photoz error with redshift. 

As shown, the NN/Dl photoz's give nominally worse 
calibration bias than the NN/CC2 photoz's for lower- 
redshift lens samples, and the reverse is true at higher red- 
shift. This trend is consistent with the difference between 
the two methods in Fig. [7] We also performed the analysis 
with the DR5 NN photoz's, and found the lensing calibration 
bias for these lens redshift distributions to be similar to the 
NN/CC2 calibration biases, well within the la errors. This 
result suggests that the failure mode to 0.75 < Zp < 1 in the 
DR5 version was not a significant source of lensing calibra- 
tion bias, and the overall positive photoz bias (present in all 
NN photoz's tested in this paper) is the main cause. 

Finally, we consider what happens if we correct for the 
mean photoz bias when estimating Ec for each source. For 
the template photoz's, this correction causes the mean cali- 
bration bias for sm7 to go from —0.014 to —0.14. This result 
may be puzzling until we consider the effects of photoz bias 
and scatter separately (section 13. 3|) . We know that photoz 
scatter causes a negative calibation bias, and a negative pho- 
toz error like this method has causes a positive calibration 
bias. When we did not correct for the mean photoz bias, 
these two effects apparently cancelled out. This cancellation 
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Table 3. Average redshift bias {bz) for nine lens redshift distributions described in the text. 





r < 21 


r > 21 


LRG 


template 


NN/CC2 


NN/Dl 


ZEBRA/SDSS 


sml 


-0.033 ± 0.008 


0.005 ± 0.009 


0.004 ± 0.003 


0.020 ± 0.003 


-0.039 ±0.007 


-0.051 ±0.007 


-0.018 ±0.006 


sm2 


-0.043 ± 0.009 


0.008 ±0.011 


0.005 ± 0.004 


0.020 ± 0.004 


-0.048 ± 0.008 


-0.059 ±0.008 


-0.022 ±0.007 


sm3 


-0.057 ±0.011 


0.013 ±0.013 


0.006 ± 0.005 


0.021 ± 0.004 


-0.059 ±0.008 


-0.070 ± 0.008 


-0.029 ±0.007 


sm4 


-0.077 ±0.012 


0.020 ±0.015 


0.007 ± 0.006 


0.020 ± 0.005 


-0.075 ±0.009 


-0.084 ±0.009 


-0.038 ±0.008 


sm5 


-0.104 ±0.014 


0.029 ±0.019 


0.010 ±0.008 


0.015 ±0.005 


-0.096 ±0.009 


-0.102 ±0.009 


-0.053 ±0.008 


sm6 


-0.136 ±0.016 


0.041 ±0.025 


0.014 ±0.011 


0.003 ± 0.007 


-0.124 ±0.011 


-0.123 ±0.011 


-0.074 ±0.010 


sm7 


-0.169 ±0.018 


0.055 ±0.033 


0.022 ±0.016 


-0.014 ± 0.009 


-0.155 ±0.015 


-0.146 ±0.015 


-0.099 ±0.012 


LRG 


-0.221 ±0.022 


0.069 ± 0.045 


0.038 ± 0.022 


-0.037 ±0.014 


-0.195 ±0.021 


-0.171 ±0.021 


-0.131 ±0.018 


maxBCG 


-0.171 ±0.018 


0.056 ±0.034 


0.023 ± 0.016 


-0.015 ± 0.009 


-0.158 ±0.015 


-0.147 ±0.015 


-0.101 ±0.013 



Lens redshift distributions 




Table 4. Average redshift bias {bz) in previous works using this 
source catalog when combining source samples. 



Lens sample 
sml 
sm2 
sm3 
sm4 
sm5 
sm6 
sm7 
LRG 



-0.016 ± 
-0.020 ± 
-0.025 ± 
-0.032 ± 
-0.039 ± 
-0.045 ± 
-0.046 ± 
h0.021 ± 



0.008 
0.009 
0.011 
0.013 
0.016 
0.020 
0.026 
0.038 



to photoz scatter; or (2) to apply a correction to the lensing 
signal due to the combined effects of photoz bias and scat- 
ter at once. In either case, we must depend on the fact that 
our calibration subsample has the same sample properties 
as the full source catalog, so that corrections derived using 
this subsample will apply to the full catalog. 



Figure 9. Lens redshift distributions for the lens samples de- 
scribed in the text. 

is a non-trivial result that depends on our sample selection. 
With a different cut on apparent magnitude, for example, 
it is not clear that the effects would balance as precisely. 
Now that we have corrected for the effects of mean photoz 
bias, we are left with the suppression of the lensing signal 
due to the photoz scatter. For the NN/CC2 and NN/Dl 
photoz's, the correction for the mean photoz bias decreases 
calibration bias from —0.16 and —0.15 to —0.10 and —0.07, 
respectively, for sm7 (since the positive photoz bias and the 
scatter change the lensing calibration in the same direction). 
For ZEBRA/SDSS, the photoz bias was slightly negative, 
so correcting for it worsens the lensing calibration bias as 
for the template photoz's, but only slightly: from —0.099 to 
-0.125 for sm7. 

From these results, we can conclude that once the effects 
of the mean photoz bias are removed, the effects on the lens- 
ing calibration due to scatter in the photoz's are the smallest 
for the SDSS NN/Dl photoz's, followed by SDSS NN/CC2, 
ZEBRA/SDSS, and finally are the largest for the template 
photoz's. This trend is consistent with the trends in Table[2] 
for the photoz scatter. We therefore have two possible pro- 
cedures for handling calibration bias in the lensing signal: 
(1) to correct for the mean photoz bias before computing 
the lensing signal, and apply a correction to the lensing sig- 
nal afterwards to account for residual calibration bias due 



5.4 Implications for previous work 

Here we determine the implications of Table [3] for previous 
work with this lensing source catalog. 

Fi rst, we consider the results for iMandelbaum et al.l 
(|2006d ). in which we divided the sample into stellar mass 
and luminosity subsamples with the seven redshift distribu- 
tions sml-sm7 shown in Fig. (9] For that work, the signal pre- 
sented was an average over the signal using the r < 21 and 
r > 21 source sample with l/cr^ weighting. To determine the 
average bias on this signal, we use our bootstrap-resampled 
bz{zi) and w{zi), averaging the bias as a function of redshift 
for each resampling using the weights for these two samples, 
then find the average over all the resampled datasets. The 
average biases for sml-sm7 are shown in Tabled 

We also consider the spectroscopic LRG lens 
redshift distributio n , whic h was used for lensing in 
IMandelbaum et al.1 (12006^ ) and IMandelbaum fc Selial3 
(200j~ In that case, we detected a ~ 15% suppression of 
the lensing signal for the r < 21 source sample relative 
to the r > 21 and LRG source samples. Table [3] makes it 
clear that this suppression was, in fact, real. To account 
for this suppression, we had multiplied the signal and its 
error by a factor of 1.18. This is equivalent to multiplying 
Ec by 1.18 when computing both the weights (oc E^^) and 
the lensing signal. We thus incorporate this factor into 
the computation of the bias in Eq. ((S} before taking the 
weighted average with the r > 21 sample. The average 
bias once the correction factor is incorporated is shown in 
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Table U Because of this suppression of the weight in the 
r < 21 sample due to the calibration factor, and because of 
its already low weight relative to r > 21 for zi > 0.22 (see 
Fig. [7]), the uncertainty on the calibration bias is actually 
dominated by the larger r > 21 sample uncertainty, which 
is why it is larger than one might naively expect from 
combining the results in Table |3] for r < 21 and r > 21. It is 
clear that this way of combining the signal for r < 21 and 
r > 21 is non-optimal from the perspective of constraining 
calibration bias. 

No results are shown for the maxBCG lensing sample 
because none of the previous works using this source catalog 
have used it. 

It is clear from this table that there was statistically 
significant redshift calibration bias in previous works using 
this source catalog. However, the absolute value of the error 
is below the statistical error on the lensing signal in those 
works, and is smaller than the generous 8% (la) systematic 
error that was used for those science results. We conclude 
that there is no cause for concern in using results in our pre- 
vious work with this catalog without applying a correction. 

5.5 Systematics: targeting and redshift failure 

In the previous sections, all quoted calibration errors were 
statistical. Here, we consider the size of systematic errors. 

First, we include the DEEP2 redshift failures in the 
sample, once putting them all ai z — and then all at 
z = 1.5 (with an LSS weight of 1). We have already shown 
in section [STT] that the failures have a similar SDSS magni- 
tude and colour distribution to the remainder of the sample. 
This statement is also true in the DEEP2 BRI photometry, 
placing these galaxies without spectroscopic redshifts in the 
< z < 0.7 colour locus (like those with successful redshift 
determination) . Consequently, placing them all at 2: = and 
z = 1.5 gives extremely conservative bounds on the system- 
atic error due to these redshift failures. Table [5] shows the 
new (62) and the change in {bz} compared to table [3] for 
all methods of source redshift determination, including the 
combined r < 21 and r > 21 method used in our previous 
work fSec. l5.4|) . for four lens redshift distributions: sml, sm4, 
sm7, and LRG, which are at progressively higher redshifts. 

As shown in Table [5] these extreme assumptions change 
our estimated calibration bias at the < Scr level, in most 
cases < la. If we consider that the real effect is likely many 
factors smaller than this (since the failures roughly follow 
the magnitude and colour distribution of the successes, and 
therefore likely the redshift distribution), this systematic is 
far below our la uncertainty on the calibration bias, from 
which we can conclude that systematic effects due to the 
excluded DEEP2 redshift failures are negligible. 

We next consider the effects of using the zCOSMOS 
photoz for their redshift failures. As shown in Fig.U the fail- 
ures have similar colours and magnitudes as the successes, so 
we do not anticipate that they will have a significantly dif- 
ferent photoz error distribution from the successes shown at 
the bottom of that figure. To test the effect of using ZEBRA 
photoz's for this 8% of the sample, we randomly replace the 
photoz's for the spectroscopic redshifts in another 8% of the 
sample that are redshift successes. We then compare the 
resulting calibration biases (bz) to the original ones. These 
results (shown in Table |6} indicate that for all methods of 



source redshift distribution determination and lens redshift 
distributions, the use of zCOSMOS photoz's for the 8% of 
the zCOSMOS sample that lacks redshifts changes the re- 
sults well below the la statistical error. We conclude that 
systematic error in our results due to redshift failures in ei- 
ther survey are unimportant, with the caveat that if the red- 
shift failures are a systematically different population than 
the successes, this test would not uncover any resulting sys- 
tematic error (however, we have no evidence that this is the 
case). 

One final systematic is that in DEEP2 EGS, roughly 
4% of our source catalog at bright magnitudes (r < 18.5) 
was not targeted. We must assess whether properly includ- 
ing these galaxies would significantly change the results. 
However, the small photoz error for bright objects, and the 
low mean redshift, makes this unlikely. In the SDSS, only 
a subset of these galaxies have spectroscopy, those with 
r < 17.7 (fiux-limited) and fainter ones that are very red. 
Since including these SDSS spectroscopic redshifts will cre- 
ate a sample with strange selection (lacking blue galaxies at 
17.7 < r < 18.5), we instead take the spectroscopic galaxies 
from zCOSMOS at r < 18.5, choose a random subset to ac- 
count for the smaller size of the DEEP2 sample, and add the 
resulting 42 galaxies to the DEEP2 sample. We then refit the 
redshift histogram for DEEP2, getting new redshift distri- 
bution parameters 2. = 0.312 ± 0.048, a = 2.14 ± 0.39, and 
(z) — 0.400 ± 0.025. We see that the change in mean source 
redshift is well within the errors in Table [T] When comput- 
ing the mean redshift bias using this augmented sample, we 
find that the changes are even smaller than those shown in 
Table [5] This is not surprising, because in that table we 
have taken redshift failures and put them at very extreme 
redshifts, whereas here we have added a comparable number 
of redshifts but with very good photoz's. 

5.6 Agreement between the two surveys 

As an additional systematics test, we compare the results 
when doing the full analysis separately for each survey. In 
this case, we use LSS weights derived using the redshift his- 
tograms for each survey separately instead of using the com- 
bined histogram. In Table [T] we show the results for each 
survey separately, with the bottom section showing the sta- 
tistical significance of the difference. 

The results in this table show apparently significant dis- 
crepancies between the results with zCOSMOS and with 
DEEP2 separately. The fact that the statistical significance 
of the difference is < 2a for the last four columns, which use 
the full catalog, but > 2a for the first column (which uses 
r < 21 only) and < la for the second column (which uses 
r > 21 only) suggests that we should focus on the r < 21 
sample to find the source of the discrepancy. We must under- 
stand this discrepancy in order to assess whether our results 
are biased or our errorbars are significantly underestimated 
on the final, combined analysis. 

In Fig. [To] we show plots for r < 21 that will shed light 
on this discrepancy. The upper left plot shows p{z) for r < 21 
for both surveys. As shown, the best-fit histograms are very 
similar, but the LSS fiuctuations are more pronounced than 
for the full sample. The lower left panel shows the ratio of 
the best-fit number predicted in zCOSMOS to the number in 
DEEP2 (normalized to the same total numbers of galaxies), 
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Table 5. Change in redshift bias [bz) for all methods of source redshift determination (including combined methods for r < 21 and 
r > 21 as in previous work, Sec. I5.4| | when putting all DEEP2 failures at z = and z = 1.5 as shown. The number given is the resulting 
redshift bias, and the number in parenthesis is the fractional change in the bias from Table |3] relative to the statistical error. 





r < 21 


r > 21 


LRG 


template 


NN/CC2 


ZEBRA/SDSS 


Previous work 










Fail to 2 = 








sml 


-0.036 (-0.38) 


-0.017 (-2.4) 


-0.002 (-2.0) 


0.008 (-4.0) 


-0.062 (-1.9) 


-0.029 (-1.8) 


-0.028 (-1.5) 


sm4 


-0.081 (-0.33) 


-0.003 (-1.5) 


0.002 (-0.8) 


0.007 (-2.6) 


-0.094 (-1.6) 


-0.050 (-1.5) 


-0.044 (-0.9) 


sm7 


-0.173 (-0.22) 


0.032 (-0.7) 


0.017 (-0.3) 


-0.027 (-1.4) 


-0.166 (-1.0) 


-0.111 (-1.0) 


-0.060 (-0.5) 


LRG 


-0.224 (-0.14) 


0.045 (-0.5) 


0.033 (-0.2) 


-0.050 (-0.9) 


-0.218 (-0.8) 


-0.143 (-0.67) 


0.005 (-0.4) 








Fail to 2 = 1.5 








sml 


-0.032(0.13) 


0.009(0.4) 


0.004 (0.00) 


0.022(0.7) 


-0.046 (0.43) 


-0.015(0.50) 


-0.013(0.38) 


sm4 


-0.075 (0.17) 


0.027(0.5) 


0.008 (0.17) 


0.024(0.8) 


-0.075 (0.56) 


-0.034 (0.50) 


-0.026(0.46) 


sm7 


-0.166 (0.17) 


0.074(0.6) 


0.024 (0.13) 


-0.005(1.0) 


-0.140 (0.73) 


-0.089 (0.83) 


-0.033(0.50) 


LRG 


-0.217(0.18) 


0.097(0.6) 


0.042 (0.18) 


-0.025(0.9) 


-0.186 (0.71) 


-0.118(0.72) 


0.043(0.58) 



Table 6. Change in redshift bias {b^) for all methods of source redshift determination when replacing 8% of the redshifts for zCOSMOS 
successes with their photoz's. The number given is the resulting redshift bias, and the number in parenthesis is the fractional change in 
the bias from Table [3] relative to the statistical error. 





r < 21 


r > 21 


LRG 


template 


NN/CC2 


ZEBRA/SDSS 


Previous work 


sml 
sm4 
sm7 
LRG 


-0.033 (0.00) 
-0.078 (-0.08) 
-0.170 (-0.06) 

-0.221 (0.00) 


0.005 (0.00) 
0.019 (-0.07) 
0.055 (0.00) 
0.070 (0.02) 


0.004 (0.00) 
0.008 (0.17) 
0.024 (0.13) 
0.041 (0.14) 


0.019 (-0.33) 
0.020 (0.00) 
-0.013 (0.11) 
-0.035 (0.14) 


-0.049 (0.00) 
-0.080 (0.00) 
-0.151 (0.00) 
-0.201 (0.00) 


-0.018 (0.00) 
-0.039 (-0.13) 
-0.098 (0.08) 
-0.130 (0.06) 


-0.016 (0.00) 
-0.032 (0.00) 
-0.046 (0.00) 
0.022 (0.03) 



with the 68% confidence region shown with dashed lines. 
This confidence region, including both Poisson and sampling 
variance error, was determined as follows: for each survey, 
200 bootstrap-resampled redshift histograms were created, 
and used to fit for the dN/dz. We then pair up the 200 
best-fit Ar^'"°d«i) ^^^^ zCOSMOS and from DEEP2 EGS, 
and determine the ratio of these values for each survey. The 
200 ratios are ranked, and the middle 68% are chosen to 
determine the 68% confidence region. It is reassuring that 
for all redshifts, this shaded region includes a ratio of 1. It 
is apparent that the scarcity of redshifts at z > 0.6 causes 
the errorbars on the ratio to become extremely large (well 
off the limits of the plot). 

The top right panel in Fig. 1101 shows bz{zs) for several 
lens redshifts. As shown, these results are very similar for 
the two surveys. The bottom right plot shows the fractional 
weight 'w{zs) for each lens redshift and survey. In principle, 
the LSS weighting was designed to ensure that these curves 
would not have structure due to LSS fluctuations in number 
density as a function of redshift. We can see (particularly for 
zi = 0.3) that the curves for each survey are quite different 
and have significant LSS fluctuations, so we must under- 
stand why this is the case. We have ascertained that if we 
use 6z(«s) from DEEP2 with the weight w{zs) from zCOS- 
MOS, we recover the same as when we use 6z(zs) and 
w{zs) from zCOSMOS, implying that the weight differences 
cause the discrepancy in (fez). 

To solve this problem, we consider only sources with 
0.3 ^ Zs < 0.35. As shown with arrows, for zi = 0.3, the 
weight in this bin is a factor of ~ 4 higher in zCOSMOS 
as in DEEP2. We have confirmed that this bin alone is a 
significant reason why the average calibration bias is on av- 
erage more negative for zCOSMOS as for DEEP2. There 
are 179 and 21 galaxies at r < 21 in this bin in zCOS- 
MOS and DEEP2 respectively. Using the LSS weights de- 
rived for each survey separately, we weight zCOSMOS and 



DEEP2 by factors of 0.8 and 2.25, giving weighted num- 
bers of galaxies of 143 and 63. Thus, the weighted ratio 
iV(zCOSMOS) /iV(DEEP2) ~ 2.3, where the expected value 
is 1.85 given the total number of galaxies in each survey. 
This ratio of 2.3 therefore represents a 23% enhancement of 
zCOSMOS relative to DEEP2, due to the fact that the LSS 
weights were derived using all galaxies in each survey, not 
just those at r < 21 that we use here. While we can therefore 
conclude that LSS weighting may need to be done as a func- 
tion of apparent magnitude, this 23% enhancement in source 
number does not account for a factor of 4 enhancement in 
the weights. 

Figure [TT] shows the photoz distribution p{zp) for kpho- 
toz for the r < 21 sources in this narrow redshift slice in 
each survey. It is important to note that our past analyses 
have required Zp > + 0.1. The photoz distributions for 
the zCOSMOS and DEEP2 galaxies in this redshift slice are 
quite different, with the DEEP2 distribution being skewed 
to lower photoz, and the zCOSMOS one to higher photoz. 
Consequently, forty of the 179 zCOSMOS galaxies pass this 
photoz cut (23%), as compared with two of the 29 DEEP2 
galaxies (7%). In terms of raw numbers, this gives an ad- 
ditional factor of 23/7 ~ 3.2 enhancement of the weight in 
zCOSMOS on top of the previous factor of 1.2. Thus, the 
two factors together give nearly the factor of four enhance- 
ment in weight that we noticed on Fig. [10] as the source of 
the discrepancy. 

Having accounted for the source of the problem, we 
must understand why the photoz distributions look so dif- 
ferent for the two surveys. The bottom panel of Fig. [TT] 
gives colour-magnitude information for these r < 21, 0.3 5^ 
Zs < 0.35 galaxies in the two surveys. As shown, the DEEP2 
galaxies are both fainter and bluer on average than those in 
zCOSMOS at this redshift. This is consistent with the fact 
that the redshift histograms show a local underdensity in 
DEEP2 and a significant overdensity in zCOSMOS at this 
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Table 7. Rcdshift bias (bz) for each survey separately. The number given is the resulting redshift bias with statistical error. The bottom 
section gives the statistical significance on the difference in units of standard deviations. 





r < 21 


r > 21 


LRG 


template 


NN/CC2 


ZEBRA/SDSS 


Previous work 










zCOSMOS 








sml 


-0.045 ±0.013 


-0.001 ±0.012 


-0.005 ±0.007 


0.020 ± 0.005 


-0.051 ± 0.009 


-0.021 ± 0.009 


-0.026 ±0.011 


sm4 


-0.101 ±0.020 


0.005 ± 0.024 


-0.009 ±0.014 


0.019 ± 0.007 


-0.089 ± 0.011 


-0.044 ±0.011 


-0.053 ±0.018 


sm7 


-0.222 ±0.029 


0.013 ±0.063 


-0.016 ±0.033 


-0.026 ±0.019 


-0.179 ± 0.027 


-0.109 ± 0.025 


-0.097 ± 0.044 


LRG 


-0.295 ±0.034 


0.011 ±0.090 


-0.012 ±0.045 


-0.059 ± 0.030 


-0.242 ± 0.040 


-0.146 ±0.038 


-0.048 ± 0.069 










DEEP2 EGS 








sml 


-0.013 ±0.006 


0.007 ±0.016 


0.013 ± 0.004 


0.018 ±0.004 


-0.048 ± 0.010 


-0.012 ± 0.007 


-0.003 ±0.010 


sm4 


-0.036 ±0.009 


0.028 ±0.026 


0.025 ± 0.009 


0.022 ± 0.006 


-0.070 ±0.012 


-0.031 ± 0.009 


-0.003 ±0.018 


sm7 


-0.071 ±0.017 


0.091 ±0.053 


0.062 ± 0.021 


0.003 ± 0.013 


-0.112 ± 0.022 


-0.085 ± 0.018 


0.023 ± 0.040 


LRG 


-0.075 ±0.025 


0.122 ±0.072 


0.089 ± 0.030 


-0.007 ±0.019 


-0.145 ± 0.032 


-0.111 ± 0.025 


0.113 ±0.060 








Statistical significance of difference (in 


units of a) 






sml 


2.23 


0.40 


2.23 


0.31 


0.22 


0.79 


1.55 


sm4 


2.96 


0.65 


2.04 


0.33 


1.17 


0.91 


1.96 


sm7 


4.49 


0.95 


1.99 


1.26 


1.92 


0.78 


2.02 


LRG 


5.21 


0.96 


1.87 


1.46 


1.89 


0.77 


1.76 



0.3 S z < 0.35 




0.4 

z (kphotoz) 
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Figure 11. Photoz distribution (top) and colour-magnitude in- 
formation for DEEP2 and zCOSMOS sources with 0.3 < Zs < 
0.35 (kphotoz). In the bottom panel, we show the g — i colour 
and r-band magnitude, where the red crosses are DEEP2 and the 
black hexagons are zCOSMOS. 



redshift. We have found that for this photoz method, the 
photoz's are biased low for blue galaxies, but not red galax- 
ies. Hence, the different photoz distributions in the top panel 
of Fig. [TT] reflect the different mixes of spectral types and 
different S/N detections of the galaxies in the two surveys 
at this source redshift, rather than some more ominous ef- 
fect such as differences in photometric calibration across the 
SDSS survey area. 

We have confirmed that similar effects are at play in 
other parts of the source redshift distribution (e.g. 0.6 ^ 
Za < 0.65) that show significant differences in weight be- 
tween the two surveys in Fig. 1101 In short, the cause of the 
different redshift biases in the two surveys is the interplay 
between large-scale structure and photoz errors, where LSS 
emphasizes certain spectral types that have different pho- 



toz error properties. (Explicit demonstration of how this ef- 
fect can come about will be shown in Section 15.111 where 
we show photoz error distributions for ZEBRA/SDSS as a 
function of colour and magnitude.) Even in the absence of 
our Zp > zi + 0.1 cut, the mean estimated would have 
been much higher in zCOSMOS than in DEEP2, giving the 
same sign of the discrepancy between the surveys as we have 
now (except in that case, both fez(zs) and w{zs) would be 
different, not just w{zs)). This interplay between photoz's 
and LSS is a problem when trying to estimate the bias due 
to redshift calibration with a reasonably small subsample of 
redshifts (~ 1000) on a small area of the sky. It is also avoid- 
able in principle, if we use our sample with spectroscopic 
redshifts to derive photoz error distributions as a function 
of colour and magnitude, which may be used to obtain ac- 
curate dp/dz for each object. 

To confirm these findings, we have boxcar-smoothed the 
weights Wsizs) shown in Fig. [10] with smoothing lengths of 
Azs = 0.1, 0.15, and 0.2 for zi = 0.1, 0.2, and 0.3 (larger 
smoothing lengths chosen for higher zi because the LSS 
fluctuations in w{zs) are more significant there). The re- 
sulting weight functions are reasonably smooth, as shown 
in Fig. 1121 but include some apparent mean offset in the 
redshift distributions for the two surveys. We find that the 
discrepancy between (bz) for the two surveys is 5%, 15%, 
and 50% smaller for zi =0.1, 0.2, and 0.3 respectively than 
when using the unsmoothed w{zs). Most of the change arises 
from the DEEP2 mean calibration bias going to lower (more 
negative) values, with the zCOSMOS mean calibration bias 
changing only slightly. The apparent 5a discrepancy in Ta- 
ble [7] for LRG lenses is thus reduced due to this smoothing 
to a 2.5a discrepancy, with the remaining discrepancy pre- 
sumably due to the offset in the weight histograms shown in 

Fig.m 

We now ask if the LSS fluctuations are the cause of 
the 2a discrepancy with the other photoz methods. As we 
will show later for ZEBRA/SDSS and have conflrmed for the 
template and neural net photoz algorithms (but do not show 
here), it is a general tendency of these photoz algorithms to 
underestimate the photoz's for blue galaxies, and slightly 
overestimate them for red galaxies. Consequently the same 
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Figure 10. Results for r < 21 only for each survey separately, as described in more detail in the text of section TS-GI We show the best-fit 
and observed redshift histograms (upper left); the ratio of the best-fit redshift distributions, with shaded 68% CL region (lower left); and 
the lensing calibration bias bz (upper right) and the lensing weight as a function of source redshift (lower right) for three lens redshifts. 



effect occurs when the mixes of spectral types are different 
in the two surveys, even when we are using another photoz 
algorithm, and this is evident in w{zs) for each survey. We 
therefore estimate using the same method of boxcar smooth- 
ing the weight as a function of redshift for each survey that 
the 2a discrepancies for these methods are really la. 

We now address another unusual feature of the cali- 
bration uncertainties in Table [T] the uncertainties are actu- 
ally smaller for DEEP2 than for zCOSMOS (only slightly 
larger than for the combined sample), despite the fact that 
sampling variance is ~ 20% larger for DEEP2 EGS as for 
zCOSMOS! This result is also due to the LSS fluctuations 
in the weights for both surveys. The DEEP2 mean calibra- 
tion bias was, as we saw previously, significantly affected by 
this problem, and it is also responsible for making the error- 
bars artificially small (since our method of getting the errors 
does not allow w{za) to vary as much as it should in real- 
ity). So, our worst-case 2.5a and la calibration differences 



for LRG lenses (with kphotoz and with the other photoz 
methods, respectively) is actually much less significant than 
these numbers suggest, and therefore not a problem. 

We must ask whether this effect means that our mean 
results are biased or our errorbars are too optimistic when 
using the combined sample of galaxies for the two surveys. 
However, we are fortunate to be able to combine large sam- 
ples at completely different points on the sky. The total 
(sample variance -I- Poisson) errors when using two uncorre- 
lated fields with TVi and galaxies are smaller than if we 
simply had a single field on the sky with A''i + galaxies 
(which would be correlated with each other). 

A comparison of Fig. [TU] with Fig. [S] can help us answer 
this question. In Fig. 1101 it is clear that the weight as a 
function of source redshift 'w{zs) for zi = 0.2 is not smooth 
at all due to LSS-photoz error correlation in each survey. 
The fluctuations are at times ~ 30% off from the value one 
might expect if the curve is smooth. However, in Fig.|8] these 
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Figure 12. Smoothed weight as a function of source redshift for 
several lens redshifts in DEEP2 and zCOSMOS, to minimize the 
effects of LSS. This plot is a smoothed version of the lower right 
panel of Fig. 1101 with the same line types and colours as in that 
plot. The smoothing algorithm is described in the text. 

curves for the combined sample are significantly smoother, 
with fluctuations that are at most 20% for the LRG sources 
(the smallest and most highly clustered sample) and even 
less for the other samples, ~ 10%. We thus conclude that 
the effect is reduced by a factor of ~ 3, and is therefore 
negligible for the combined sample. To verify this conclu- 
sion, we have performed the same boxcar smoothing of the 
weight functions in Fig. [8] with the same smoothing lengths 
as for the two survey subsamples, and found that the result- 
ing redshift calibration biases (bz) for the combined sample 
changed by < 0.5% for sml-sm5, < 1% for sm6, sm7, LRGs, 
and maxBCG lenses. These changes are well within the la 
errors on the calibration bias for these lens samples. 

Finally, we notice in the top panel of Fig. [TT] that our 
naive requirement that Zp > z;-|-0.1 has required us to ignore 
a significant majority of the galaxies in this redshift slice, all 
of which are actually lensed. Since zi = 0.3 and the sources 
are all at true redshifts Zs > 0.3, we could conceivably use 
them all for lensing; using the subset at Zp > 0.4 eliminates 
a large fraction of these sources. We return to this point in 
sections 15.81 and 15.101 

5.7 Size of errorbars on calibration bias 

While we have previously asserted (section |4.3|I that corre- 
lations between the bins in the redshift histograms should 
be negligible, we now present tests of this assertion, which 
(if violated) could cause the errorbars to be underestimated. 
One reason why they might be violated is the existence of 
a supercluster that happens to lie partially within two his- 
togram bins instead of entirely within one. While such a 
large LSS fluctuation is unlikely in an area of such small 
comoving volume, we nonetheless present tests of this pos- 
sibility. 

As an example of a candidate supercluster, we flnd a 



large overdensity with 0.34 < z < 0.38 in zCOSMOS. By 
plotting the detailed redshift distribution in this region, we 
see that there are, in fact, ~ 3 large overdensities with line 
of sight separations of ~ 80/i~^Mpc between them. Clusters 
that are separated by such a large separation are unlikely 
to be correlated: the correlation function for dark matter at 
this separation is 10~^, so the clusters would need to have 
bias of ~ 30 to have the correlation probability to become 
appreciable relative to a random distribution. There should 
be fewer than one cluster with such a high bias in an ob- 
servable universe. While magnifi cation bias may increase the 
probability by a factor of a few (|Hui et al.ll2007f ). it does so 
by invoking the cross-correlation between mass and galaxies, 
so one loses one power of the bias, which therefore cannot 
bring the correlations to a level comparable to unity. These 
galaxy bias and magnification bias effects are diflicult to 
simulate realistically, so we cannot turn to simulations to 
solve this problem. 

To test the effects on the errorbars of the best-fit red- 
shift distribution and on the final calibration bias, we redo 
the analysis using bins of size Az — 0.1, which will then in- 
clude these structures all in one bin. We find that for zCOS- 
MOS, this procedure increases the errors on the final results 
by 30%, whereas the size of the errors for DEEP2 and the 
combined sample (DEEP2 -I- zCOSMOS) are essentially un- 
affected. 

As an additional test, we shift the original histogram 
bins by —0.02 in redshift, so that all three structures fall 
into the bin from 0.33 ^ z < 0.38. We find that while the 
best-fit redshift histogram is unaffected, the errors on it are 
significantly increased (by nearly a factor of 2 in the bins 
near this LSS fiuctuation, and a smaller factor further away 
from it). To understand why it has such a large effect, we 
consider that it adds an additional number of galaxies AA^ 
to the histogram in that one bin. The penalty on the fit A'^ 
(Eq. [7]) is therefore (AA^)^. When we consider splitting the 
fluctuation equally into two bins (as we had effectively been 
doing before), the excess number of galaxies in each bin is 
0.5AA', leading to a A^ penalty of 2(0. 5AA')^ = Q.5{ANf, 
half as much as if the entire overdensity is in one bin. The 
effect when fitting to the shifted histogram using both sur- 
veys together is nearly the same as when fitting zCOSMOS 
alone, whereas the errors for DEEP2 alone are unaffected 
(because our contrived bin-shifting did not correlate with 
any LSS fluctuations in DEEP2). 

Given that these structures are likely to be uncorre- 
lated, our bin-shifting that treated them as correlated leads 
to over-estimated errors. On the other hand, our default bin- 
ning puts one of them into one histogram bin, and left the 
other two together; we may therefore suppose that our er- 
rors for zCOSMOS and the combined sample are, in fact, 
slightly over-estimated (since we effectively treated two of 
the structures as correlated). It is clear that the limited 
number of independent patches makes the error estimate 
from the bootstrap noisy, and while our final results may 
be treated as having conservative errorbars, we cannot ex- 
clude the possibility that they may be a factor of two larger. 
However, this finding that the zCOSMOS errorbars may be 
overestimated may also explain the fact that in the previ- 
ous section, we found the calibration of the lensing signal in 
DEEP2 to be constrained more tightly than in zCOSMOS 
despite the fact that DEEP2 is smaller. 
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Finally, we note that bootstrapping M data points >• 
M times will in general lead to statistical uncertainty in the 
determined errors at the 1/a/M level. For the case where we 
bootstrap a redshift histogram with 24 bins to get the best- 
fit redshift distribution, and use those results to get errors 
on the lensing signal calibration uncertainty, the errors are 
therefore reliable at the ~ 20% level. This uncertainty is due 
to noise, rather than violation of the bootstrap assumptions 
as in the rest of this section. 



5.8 Purity and completeness 

Here we address questions of purity and completeness of the 
source sample for each photoz method. We define purity as 
the fraction of the total estimated lensing weight that is 
attributed to sources with spectroscopic redshift above the 
lens redshift (i.e., that are truly lensed). Low purity would 
be associated with a strong negative calibration bias. Com- 
pleteness can be defined by constructing the analogues of 
the lensing weights in Eq. but using the true Sc rather 
than the estimated one. We then define a "true" Wj for each 
object, and find the fraction of the total summed "true" 
weights that is actually used by lensed sources defined us- 
ing any given photoz method. Low completeness can occur 
because photoz's are scattered low, so that we assume they 
are below the lens redshift. 

These two issues, purity and completeness, are two of 
the three factors that determine the statistical error on the 
lensing signal AE for a given photoz method as compared 
with the statistical error in the optimal case where all lens 
and source redshifts are known. The final factor is how much 
a photoz method causes the weighting scheme to deviate 
from optimal weighting. We would like to estimate the total 
increase in the error on the lensing signal due to all three 
factors combined. 

To do so, we consider the lensing signal estimator in 
the optimal case where all lenses and sources are known. In 
that case, we have a shear 7, a critical surface density Ec, 
and weights w = 1/(Ec(T7)^. (These weights are analogous 
to those defined in Eq. UJ where a^, comes from shape noise 
and measurement error added in quadrature.) In this ideal 
case, the lensing signal is 

E«'(Sc7) 



AE : 



and its variance is 



Ideal var(AE) = 



(11) 



(12) 



(E"')^ (E"')^ E^' 

In reality, we have an estimated critical surface density 
Ec, an estimated weight w = l/(EcO"-y)^, and a calibration 
bias defined via Eq. ([5]). We can relate it to the true lensing 
signal 



AE 



so its variance is 
Real var(AE) = 



E^(Sc7) 
(1 + 6.)E*' 



+ (i + 6.)^(E^) 

We then rearrange the definition of as follows: 



1 + 6. = 



E* 



(13) 



(14) 



(15) 



Inserting this form for 1 + bz into equation 1141 we find that 



Real var(AE) = ^_ 

( y^ \/ ww)^ 

Comparing equations 1121 and 1161 we find that 
Ideal var(AE) 



Real var( AE) (E (E ™) ' 



(16) 



(17) 



This ratio has the form of a correlation coefficient between 
the square roots of the real and ideal weights for each lens- 
source pair, and therefore is constrained to lie between 
and 1 (not between -1 and 1 as for correlation coefficients in 
general, since the weights are strictly ^ 0). It is only equal 
to one in the case where the estimated weight w is strictly 
proportional to the ideal weight w. This is as it should be: 
the measured ("real") variance of the lensing signal using 
a given photoz method is always greater than or equal to 
the ideal variance. This expression encodes all three possible 
ways the real measurement can be degraded relative to the 
ideal one: via loss of lensed sources, inclusion of sources that 
are not lensed, and non-optimal weighting. This statistic is 
therefore another lensing-optimized metric than can be used 
to classify photoz algorithms for g-g lensing purposes. 

Fig. ll3l shows the purities (bottom left), completenesses 
(top left), the variance ratio (top right), and the implied 
change in variance due to non-optimal weighting (bottom 
right) as a function of lens redshift for each method. We 
first consider the completeness as a function of lens redshift 
in the top left panel of Fig. 1131 The results for kphotoz verify 
our previous findings that the combination of a broad photoz 
error distribution with our requirement that Zp > z; -I- 0.1 
causes us to lose a significant fraction of the available lensing 
weight. The results for the LRG source sample verify our 
previous assertions that the photoz's for these sources are 
able to correctly put them all at high redshift, so that we 
do not lose essentially any of them. The template photoz 
completeness is ~ 80% on average, which is not surprising 
given the significant failure mode to Zp = that causes 
us to lose some sources. The neural net photoz's (CC2 and 
Dl) give the highest completeness of all the photoz methods 
considered here (except the highly specialized LRG source 
sample), in part due to the positive mean photoz error. 

In the lower left panel of Fig. 1131 we see the purity 
as a function of lens redshift. The swiftly declining purity 
above zi — 0.2 for kphotoz is the main cause of the large 
negative calibration bias for this method for higher redshift 
lens samples, and is a result of large photoz error coupled 
with a lower mean redshift for r < 21 than the full samples 
used for the other photoz methods. The LRG source sam- 
ple purity is uniformly high, dropping from 1 at = to 
a minimum of 0.96 at zi — 0.35. This result attests to the 
efficiency of the colour cuts in selecting only high-redshift 
sources, and the small size of the photoz error distribution. 
Of the other photoz methods, the template photoz has the 
highest purity; the tendency towards a positive photoz er- 
ror seen previously for the NN and ZEBRA/SDSS photoz's 
cause a decline in purity with redshift (though it is also the 
cause of their relatively high completeness) just as it causes 
a negative calibration bias in the lensing signal. 

The upper right panel of Fig. [TJ] shows the variance 
in the ideal case relative to the true variance that results 
from using a given photoz method. For kphotoz, this num- 
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Figure 13. Left: Completeness (top) and purity (bottom) as defined in tiie text as a function of lens redshift. Top right: The resulting 
ratio of ideal to real variance for each method of source redshift determination. Bottom right: the derived change in variance due to the 
non-optimal weighting. Redshift determination methods are as follows: solid black = kphotoz (r < 21), dotted red = r > 21 redshift 
distribution, dashed blue = high-redshift LRGs, long-dashed green = template photoz's, long-short-dashed magenta = NN/CC2 photoz's, 
long-short-dashed yellow = NN/Dl photoz's, and dot-dashed cyan = ZEBRA/SDSS. 



ber drops as low as 0.2 for zi > 0.3, implying that the errors 
are a factor of \/l/0.2 ~ 2.2 larger when using this photoz 
method than in the ideal case. ZEBRA/SDSS and the tem- 
plate photoz's give similar results for this parameter, from 
0.85 at zi — to 0.5 at zi — 0.35, implying errors ranging 
from 1.1 to 1.4 times the ideal. The NN photoz's give slightly 
better results than that, as does using a redshift distribu- 
tion for r > 21 galaxies. The high-redshift LRGs naturally 
give nearly identical errors in reality than in the ideal case, 
because the sources are at redshifts significantly higher than 
the lenses, so any photoz errors cannot cause a significant 
deviation from optimal weighting. 

Finally, the lower right panel shows the estimated 
change in variance due to non-optimal weighting, obtained 
by taking the variance ratio and dividing out the effects of 
impurity and incompleteness. The results suggest that for 



all source samples except the high-redshift LRGs, the non- 
optimal weighting is non-optimal has a similar effect on the 
errors independent of photoz method, increasing them by 
~ 7% at worst for this range of lens redshifts. 

5.9 Using p{z) distributions 

Here we consider the possibility of using a full redshift prob- 
ability distribution, p{z), for each object, with two different 
sources of this distribution. The first is the posterior p{z) 
from the ZEBRA/SDSS method. For this method, p{z) is de- 
termined by marginalizing over templates T using the joint 
redshift-template prior P{z,T) and the likelihood L{z,T) 
from the fit x^: 

p{z)(xJ2Liz,T)P{z,T) (18) 

T 
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Table 8. Average calibration bias (fez) for several lens redshift 
distributions using the full posterior p(z) to get Sc. The errors 
are approximately the same on the two columns. 



Lensing calibration bias 



Lenses 


ZEBRA/SDSS p{z) 


NN p{z) 


sml 


0.013 ±0.006 


-0.001 


sm2 


0.012 ±0.007 


-0.001 


sm3 


0.011 ±0.007 


-0.002 


sm4 


0.009 ± 0.008 


-0.002 


sm5 


0.005 ± 0.008 


-0.002 


sm6 


-0.002 ± 0.010 


-0.003 


sm7 


-0.013 ±0.014 


-0.005 


LRG 


-0.032 ±0.018 


-0.007 


maxBCG 


-0.014 ±0.013 


-0.006 



The second is a p{z) distr ibution determined u sing; some 
of the machinery described in lOvaizu et aP |20o3) but inde- 
pendently of the photoz determination in that paper. The 
photoz-independent estimate of p{z) (Cunha et al. 2007, in 
prep.) is calculated as follows: the training set comprised of 
639 915 spectroscopic objects fro m a variety of surve ys is 
reweighted using the procedures in lOvaizu et aP (|2007l ) and 
Lima et al. (2007), in prep, to match the joint, 5-dimensional 
probability distribution of the source catalog for which we 
would like to obtain photoz's. The five parameters used to 
create this distribution are u — g , g — r , r — i, i — z colours 
and the r-band apparent magnitude. The redshift distribu- 
tion of the weighted training set provides an estimate of the 
true underlying distribution of the photometric sample. The 
estimate of p{z) for each gala:xy in the photometric sample 
is given by the weighted Zspec distribution of the 100 nearest 
training set neighbors in colour/magnitude space (the same 
4-colours and r band-magnitude mentioned above). Finally, 
to reduce the effects of Poisson noise, large-scale structure, 
and magnitude errors in the training sample, we adopt a 
"moving window" smoothing technique. We calculate p{z) 
in 140 bins in the redshift range < z < 2 with a constant 
bin width of 0.067. The p{z) derived in this way will be re- 
ferred to as the NN p{z), where NN in this context refers to 
"nearest neighbor" rather than "neural net." 

In this section, we recompute hz{zi) and for various 
lens redshift distributions, but instead of using the photoz 



to get Ec(2!, 



we integrate over the full p{z) 



(normalized to integrate to unity): 

/oo 
p{z)T,~^{zi,z) Az. 



(19) 



We then compare the results using the two estimates of p{z) 
to the results using the photoz alone. Figure [14] shows the 
calibration bias foz as a function of zi using the photozs di- 
rectly (as in Fig.[7| and the full estimates oip{z). In TablelS] 
we show the calibration bias averaged over various lens red- 
shift distributions (as in Table ^ using the full p{z). As 
shown in both the figure and the table, most of the calibra- 
tion bias is eliminated when using the full p{z) from either 
method. 

The fact that the bias is nearly eliminated by using the 
full posterior p{z) is not a trivial result; when integrating 
over a p{z), there are many effects that will change the Ec 
estimation in opposing directions. We have determined that 
the reason the negative calibration bias was nearly elimi- 
nated is the change in Ec for sources with photoz near the 



-0.1 




Full ZEBRA/SDSS p(z)"--. 

__ZEBRA/SDSS photoz 
. _NN weighted p{z) 

NN/CC2 photoz 
_ NN/Dl photoz 



0.1 



Figure 14. Lensing calibration bias bz{zi) using photoz's alone 
versus using the full p{z) to compute Ec as described in the text. 



lens redshift but slightly above it. When using the photoz 
alone, Ec was on average underestimated due to the way it 
varies with source redshift near the lens. Integrating over 
the full p{z) raises it to a more reasonable value, which both 
increases the signal calibration and lowers the weight given 
to these sources. 

To understand this result in more detail, we consider 
Fig. 1151 which shows the full spectroscopic sample redshift 
distributions from spectroscopy, from the NN/CC2 photoz, 
and from the summation of the p{z) for each object. As 
shown, the use of p{z) gives a mean redshift that is quite 
close to the mean redshift of the full sample, unlike for the 
photoz's which gives a higher mean redshift. There is a slight 
suggestion that the p{z) for objects at 2 ~ 0.6 is getting 
spread to higher redshift, but these objects are such a small 
fraction of the sample and the critical surface density is not 
varying strongly with source redshift at these high redshifts, 
so this effect is not very important for lensing calibration 
with 2; < 0.35. It is this correction to the mean redshift, in 
combination with an inclusion of a realistic estimate of the 
scatter for each object when estimating Ec, that eliminates 
the non-negligible calibration bias when using NN/CC2 pho- 
toz's alone. 



5.10 Avoiding physically-associated pairs 

One benefit of using photoz's instead of a source red- 
shift distribution is that it is possible to eliminate some 
fraction of the "source" galaxies that are physically asso- 
ciated with the lenses. This is important because of in- 
trinsic alignments which can suppress the lensing sig nal 
(|Agustsson fc Brainerd|[2006l : iMandelbaum et al.|[2006bh . 

In the absence of detailed calibration of the photoz er- 
ror distribution, we can simply require Zs > zi + e for some 
e, with the best chance of success if the photoz method does 
not have a mean positive bias {zp~z) > for all redshifts for 
which there are lenses. Our current method (kphotoz), the 
neural network photoz's, and the ZEBRA/SDSS photoz's 
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Table 9. Parameters of fits to redshift distribution from Eq. (|9]l 
for all photometric galaxies. 



Figure 15. Redshift distribution dp/d2 for the full calibration 
sample using spectroscopic redshifts, NN/CC2 photoz's, and the 
NN for each object. 



clearly fail this criterion. Of the methods under considera- 
tion here, only the SDSS template photoz's are optimal for 
avoiding the inclusion of physically-associated sources with 
this simple scheme. This is due to their negative photoz 
bias, which may be a liability in some other applications 
and which may cause us to exclude so many sources that 
the statistical error on the signal is strongly degraded. 

In the context of our previous work, the plots in sec- 
tion [5]6] make it quite apparent that our naive > + 0.1 
cut, while the best we could do with only 162 spectroscopic 
redshifts with which to determine the photoz error distri- 
bution, was causing us to eliminate a significant fraction of 
true, lensed sources from the analysis, without even fulfilling 
our purpose of excluding nearly all the physically associated 
sources. 

However, the existence of this analysis will help us fix 
this problem for the future. With detailed understanding of 
the photoz error distribution from several thousand sources, 
we can simply construct a redshift distribution (see sec- 
tion [STTT} as a function of photoz, source colour and magni- 
tude. This distribution will tell us p(2:|2:p, r, colour). We can 
then choose to only use sources with 



f 

J z, 



p{z\zp, r, colour) dz > pthr 



(20) 



for some threshold probability pthres- The choice of pthres 
will depend on the situation: it should be large for 
lens samples such as LRGs and clusters in which intrin- 
sic alignments of satel li te elh p ticities have been detected 
llAgustsson fc Brairierdl [20061 : iMandelbaum et al.l l2006bl : 
iFaltenbacher et aL 20071 ). and at small transverse separa- 
tions (< 200kpc) where the effect is similar to or larger than 
the statistical error. In other scenarios, such as at larger 
transverse separations, we may find that we can afford a 
lower Pthres, eveu zero (because we are only using it to re- 
move physically-associated sources to avoid intrinsic align- 
ment contamination, not those with zero shear). A simpler 



Sample 




2« 


a 




19 ^ r < 20 


529 


0.157 ±0.021 


4.04 ± 1.03 


0.290 ±0.015 


20 ^ r < 21 


1446 


0.196 ±0.031 


4.15 ± 1.20 


0.363 ± 0.013 


21 < r < 22 


2996 


0.290 ± 0.022 


3.08 ±0.33 


0.467 ±0.017 



alternative to this procedure for ZEBRA/SDSS and other 
similar methods that return a full posterior p{z) is to per- 
form the integral in Eq. 1201 using that p{z), provided that it 
is found to accurately describe the redshift distribution for 
galaxies of a given magnitude and colour. 

Note that once we have applied such a cut on the source 
sample, the true redshift distribution of those sources is 
changed, so we must re-estimate the lensing calibration bias, 
and if we had chosen to deconvolve the photoz error distri- 
bution for more accurate estimation of the critical surface 
density, we would have to redo this procedure. This is one 
major reason we have chosen to estimate the calibration bias 
using photoz's directly. 

Fig. |6] suggests that the optimal methods for the pur- 
pose of excluding physically-associated sources with this 
more sophisticated method are the NN and ZEBRA/SDSS 
methods, because of the lack of failure modes that will com- 
plicate this procedure (i.e., because their error distributions 
are more compact, and therefore easier to sample fully us- 
ing a spectroscopic sample of limited size, and because the 
p{z) will not be multimodal as for the other methods). This 
statement applies to samples of galaxies reasonably similar 
to those presented here, but would need to be re-evaluated 
for samples that are much deeper, bluer, and/or at signifi- 
cantly higher redshift. 



5.11 Without lensing selection 

Here we show some results for a full flux-limited sample of 
redshifts from zCOSMOS and DEEP2. The difference be- 
tween these and the previous results is that here, we do not 
imposed the lensing selection cuts. Instead, we have sim- 
ply required that there be a match in the SDSS reductions 
(rerun 137) within 1" of the spectrum from zCOSMOS or 
DEEP2. 

For this test, we use 3415 photometric galaxies from 
SDSS with r < 22 that have spectra from zCOSMOS (or 
zCOSMOS photoz's for the 8% with redshifts with reliabil- 
ity < 99%), and 1761 from DEEP2. Figure \W\ shows the 
redshift histograms p{z) in magnitude bins one magnitude 
wide, with best-fit redshift distributions using the functional 
form in equation The best-fit parameters are tabulated 
in Table [9] For these results, we have again included the 
DEEP2 selection probabilities; however the selection is so 
fiat for the magnitude range shown here that the effect on 
the final results is negligible. 

We also use these results to test the effects of lensing se- 
lection. As an example, we use the ZEBRA/SDSS photoz's 
for this comparison. Figure [17] shows the effects of lensing 
selection on apparent magnitude, redshift, and photoz his- 
tograms. Here we require r < 21.8 rather than r < 22 in or- 
der to compare more readily against our source catalog; this 
cut reduces the number of matches in the flux-limited sample 
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Figure 16. Redshift distributions for all pliotometric galaxies 
witliout lensine selection. 




0.2 0.4 0.6 



Figure 17. Magnitude (top), redsliift (middle), and ZE- 
BRA/SDSS photoz (bottom) histograms for the full flux-limited 
sample and for the lensing sources. For the magnitude histogram, 
we have normalized both to the same number of galaxies so the 
fraction that pass our cuts as a function of magnitude will be 
apparent. For the redshift and photoz histograms, the histograms 
for both the full and the lensing-selccted sample are normalized 
to integrate to unity. 



by 13%. The magnitude distribution in the flux-limited sam- 
ple does not rise as sharply as expected at the very faint end 
because of difficulties with star/galaxy separation in SPSS . 
A previous comparison with HST data (jLupton et al.|[200ll ) 
found that the default SDSS star/galaxy separation tends to 
err on the side of putting more galaxies as stars rather than 
vice versa, causing the galaxy counts to flatten for r > 21.5 
in a way that depends on the seeing (more flattening in worse 
seeing). 



As shown, the lensing selection rate is a strong function 
of r-band magnitude, ranging from nearly one around r ~ 

19 to ~ 0.3 around the flux limit of 21.8. Nonetheless, the 
redshift distribution is nearly the same for the full and the 
lensing-selected sample. This non-trivial result requires some 
explanation, since we have already established (a) in the top 
panel of Fig. [17] that the flux-limited sample is fainter on 
average than the lensing-selected sample, and (b) in Fig. 1161 
that fainter samples are on average at higher redshift. A 
reconcilation of these facts would require that at a given 
apparent magnitude, the lensing-selected sample is at higher 
redshift than the flux-limited sample. 

To explain this result, we consider two early- type galax- 
ies at the same apparent magnitude but different redshifts 
zi and Z2 > zi, in the limit that the differences in their 
redshifts is small enough that the fc-correction connecting 
the bandpasses at the two redshifts is negligible. In that 
case, the more distant galaxy is more luminous by a factor 
of [Dl{z2) / Dl{zi)]^ (where Dl here is the luminosity dis- 
tance). For early type galaxies, the physical si ze of the galaxy 
is rel ated to luminosity via R oc L^ * (e.g., iBernardi et al.l 
I2OO7I ), so the more distant galaxy is intrinsically larger than 
the more nearby one by a factor of [Dl(22)/Dl(zi)]^ *. The 
angular size of the more distant galaxy relative to the more 
nearby one is smaller by a factor of Z)a(zi)/Oa(z2) (Da is 
the angular diameter distance). We therefore conclude that 
before convolution with the PSF, the factor due to the in- 
trinsic luminosity and size difference wins out over the factor 
due to the decreased angular size, so the more distant galaxy 
is actually larger. This argument suggests that if one of the 
galaxies will be eliminated due to our apparent size cut, 
it is the one at lower redshift. This counter-intuitive argu- 
ment (which may explain our finding above, that the lensing- 
selected redshift distribution is the same as the flux-limited 
one despite being brighter on average) is not nearly the full 
story, because (a) in many situations, the fc-corrections or 
luminosity evolution will change the outcome of this result, 
and (b) not all galaxies are early types following this scaling 
relation between luminosity and size, but it appears to be 
a strong enough effect that it balances out the difference in 
mean depth between the samples. One must also consider 
the effects of the luminosity function, which means that the 
galaxies at the same magnitude but higher redshift will be 
fewer in number, so while they are less likely to be elim- 
inated by an apparent size cut, they will also be rarer to 
begin with. 

As a test of this unexpected finding, we flt redshift dis- 
tributions to the lensing-selected galaxies as a function of 
apparent magnitude, and compared to the mean redshifts in 
Table [9l For flux-limited samples, when using 19 ^ r < 20, 

20 ^ r < 21, and 21 < r < 22, we find mean red- 
shifts of 0.290 ± 0.015, 0.363 ± 0.013, and 0.467 ± 0.017. 
For the lensing-selected samples with the same cuts on ap- 
parent magnitude, we find mean redshifts of 0.287 ± 0.015 
(well within la of the flux- limited sample), 0.372 ± 0.015 
(0.5cr higher than the flux-limited sample), and 0.484±0.015 
(0.7(T higher than the flux- limited sample). The results for 
the faintest sample are most remarkable, because the flux- 
limited sample used for the flts is cut at r = 22, whereas the 
lensing-selected sample is cut at r — 21.8, so its mean mag- 
nitude is 0.2 magnitudes brighter yet it is at slightly higher 
redshift. The effect is fortuituously of just the right size that, 
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Table 10. Mean photoz bias and scatter for the ZEBRA/SDSS 
algorithm as a function of colour and magnitude for all photo- 
metric and lensing-selected galaxies. 







Flux-limited 


Lensing 


-selected 


Colour 


Magnitude 


bias 


scatter 


bias 


scatter 


Red 


r < 19.6 


0.038 


0.082 


0.039 


0.085 


Red 


19.6 ^ r < 20.4 


0.029 


0.098 


0.035 


0.101 


Red 


20.4 < r < 21.1 


0.029 


0.118 


0.039 


0.119 


Red 


r > 21.1 


0.017 


0.126 


0.013 


0.126 


Blue 


r < 20.4 


0.004 


0.123 


0.008 


0.110 


Blue 


20.4 ^ r < 21.0 


-0.034 


0.173 


-0.025 


0.143 


Blue 


21.0 < r < 21.35 


-0.060 


0.181 


-0.043 


0.154 


Blue 


r > 21.35 


-0.104 


0.201 


-0.114 


0.187 



despite the full lensing-selected sample being brighter, the 
redshift distribution is nearly the same as for the flux-limited 
sample. 

Next, we present photoz error distributions as a func- 
tion of colour and magnitude for the full and the lensing- 
selected sample. We split the sample by colour because of 
the fact that photoz's are easier to compute for red galaxies 
than for blue ones due to their clearer colour-redshift rela- 
tion. Our colour separator is redshift-dependent and purely 
empirical based on the sample properties, g — i = 0.7-1-2.672:. 
The slope was chosen to roughly trace the observed colour 
of the red ridge, with 40% of the galaxies classified as red. 
Within each colour, we then split into roughly equal num- 
bers of galaxies based on magnitude, so the magnitude bins 
are different for each colour. While we tabulate the mean 
photoz bias, {zp — z) in analogy to earlier in this paper, the 
plots show p(z — Zp) since that can be used in combination 
with p{zp) to reconstruct p{z\r,g — i). 

Because it would take a significant amount of space to 
present the distributions as a function of photoz, we average 
them over all values of photoz. Table [TD] shows the mean bias 
and scatter as a function of colour and magnitude. Figure [181 
shows the error distributions as a function of colour and 
magnitude, and a Gaussian with the sample mean bias and 
scatter, to make any non-Gaussianity apparent. 

As shown in the table llOl the imposition of lensing selec- 
tion seems to slightly decrease the scatter for blue galaxies, 
but has little effect for red galaxies. Figure [TSl shows that for 
red galaxies, the photoz error distributions are slightly non- 
Gaussian, whereas for blue galaxies they are significantly 
non-Gaussian. We also see the same pattern as for kpho- 
toz, a positive photoz bias for red galaxies and negative for 
blue ones, and different sizes for the scatter. These trends 
will emphasize the correlation we have previously noted be- 
tween LSS and photoz error. We have not attempted any 
more complex functional modeling, e.g. double Gaussians, 
but future work will use the true distributions (smoothed) 
rather than the Gaussians. 

5.12 Star/galaxy separation results 

We also matched our source catalog against a catalog of 
objects from COSMOS with stellarity information. Their 
space-based photometry allows a more reliable star/galaxy 
classification than in SDSS. Here we use their stellarity 
information that is determined using both the Sextractor 
CLASS-STAR parameter and visual inspection, as follows: 



Red (left) and blue (right) photometric galaxies 




-0.5 0.5 -0.5 0.5 



Figure 18. Photoz error distributions for the SDSS/ZEBRA 
method as a function of colour and magnitude for the full flux- 
limited sample (black, solid) and for the lensing sources (red, 
dashed). We have also shown the Gaussians with the mean and 
scatter from Table [TOl 

• Those with CLASS_STAR ^ 0.8 are automatically 
counted as stars, without visual inspection. 

• Those with CLASS_STAR < 0.8 are visually inspected, 
with the decision about star/galaxy classification made 
based on the inspection. 

Of the 7028 matches between the COSMOS catalog and 
our source catalog, 67 are identified in COSMOS as stars, or 
0.95%. This number is constrained to be within [0.74, 1.21]% 
at the 95% CL assuming Poisson errors. To check whether 
this number is typical compared to the rest of the survey, 
we compute the mean r-band seeing in the COSMOS area 
compared to the entire SDSS survey area, and find that the 
mean seeing in the area that overlaps with COSMOS is 1.20" 
(PSF FWHM), compared to 1.18" in the rest of the survey. 
We therefore conclude that this number is fairly typical and 
may be applied as a correction to the entire source catalog, 
provided that the stellar contamination fraction is not an 
extremely strong function of the PSF FWHM. 

To test for this possibility, we have used three SDSS 
runs that overlap the COSMOS region and have r-band 
PSF FWHM ranging from 0.9 to 1.4, a range that includes 
~ 85% of the source sample across the SDSS survey area. 
We then determined the stellar contamination fraction in 
bins of PSF FWHM after application of all lensing selec- 
tion criteria. For the four bins with median PSF FWHM 
of 1.02, 1.14, 1.21, and 1.3", the stellar contamination frac- 
tions are 1.04%, 0.92%, 0.79%, and 0.56%. The trend of 
decreasing stellar contamination in poorer seeing is not well 
understood; however, the mean source number density also 
decreases in poor seeing, so it seems that our cuts may be 
overly conservative in regions of poor seeing. This trend, 
when including Poisson errorbars, is not quite significant at 
the 2(7 level. However, it is apparent that the stellar con- 
tamination fraction does not shoot up rapidly in any part 
of this range of PSF FWHM including nearly all the source 
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Figure 19. Fraction of the weight for our three source samples 
that is attributed to stellar contamination as a function of lens 
redshift. 

sample, so we conclude that our value of 0.95% should apply 
to the rest of the source catalog. 

To properly apply this number to the rest of the source 
sample, we must take into account that the number density 
of stars depends on galactic latitude in some complex way. 
The average (1/ sin 6) for the whole source catalog is 1.40, 
and for the COSMOS region is 1.43, so we conclude that no 
correction for the variation of stellar density with galactic 
latitude is necessary. While this calculation would not work 
if we included regions where sin 6 ~ due to the strong in- 
crease in stellar number density there, our requirement that 
r-band extinction be less than 0.2 magnitudes effectively 
eliminates these regions from the source catalog. 

However, we cannot conclude that the fractional con- 
tamination in the lensing signal is —0.0095, because it de- 
pends on the weight given to these sources. The total frac- 
tion of the weight attributed to the stellar contamination as 
a function of lens redshift is shown in Fig. [19] for the three 
source redshift determination methods used in our current 
catalog. As shown, the fraction of the weight attributed to 
stars is in general larger than the actual stellar contamina- 
tion fraction. This fraction rises significantly with redshift 
for the r < 21 sample because the stellar contamination 
tends to be given relatively high photoz. This is because the 
stellar contamination is predominantly M stars that mas- 
querade as red galaxies at the high end of the redshift range 
for this sample. However, as shown in Fig.[7l the r > 21 sam- 
ple has four times as much weight at these lens redshifts, so 
the contamination to the signal is not strongly affected by 
this increase in the contamination fraction for the r < 21 
sample. 

6 DISCUSSION 

In this paper, we have proposed a method for precision cal- 
ibration of the source redshift distribution for g-g lensing 



with lens spectroscopy using representative subsamples of 
the source catalog with spectroscopy. The key components 
of this method are an estimator for the g-g lensing cali- 
bration bias (Eq. [5} and for the degradation of the sta- 
tistical error due to non-optimal weighting (Eq. I17|l . This 
method includes techniques for handling complications such 
as large-scale structure in the spectroscopic redshift sample, 
and redshift failure. We then demonstrated its implemen- 
tation by matching an SDSS lensing catalog used for many 
previous science works against a sample of spectroscopic red- 
shifts from DEEP2 and zCOSMOS. We have also used this 
method to assess the utility of three more recent photoz al- 
gorithms that have been proposed for use with SDSS data. 
In Appendix|X]we discuss the extension of these techniques 
to g-g lensing with lens photoz's; with redshift distributions 
for the lenses; and to cosmic shear. 

Our results in section [5^ show that the galaxy-galaxy 
lensing calibration bias can be as high as 20-30% for some 
of the photoz methods, especially for higher lens redshifts. 
This is despite the fact that for all of the photoz methods, 
the average redshift bias is well below the scatter. The rea- 
son for this finding is the nonlinear dependence of the critical 
surface density on the source redshift, which amplifies the 
photoz errors in a highly asymmetric way: while an underes- 
timate of photoz to a value below the lens redshift leads to 
a rejection of the source galaxy and does not produce lens- 
ing bias, an overestimate leads to an enhancement of lensing 
weight and can produce a significant bias. One of the main 
lessons of present work is that lensing applications require a 
dedicated photoz calibration, which can give very different 
results from the general photoz calibration tests. 

Our analysis demonstrates that the calibration bias in 
the lensing signal due to redshift distribution uncertainty 
in previous works using the SDSS source catalog used for 
several previous science projects was well within the quoted 
systematic error of 8%. Future lensing work using this source 
catalog will use the results in this paper to obtain a highly 
accurate lensing calibration with a smaller uncertainty than 
in our previous work. The decreased systematic error budget 
due to redshift calibration uncertainty, which is now known 
to ~ 2% due to this work, is a timely improvement to SDSS 
g-g lensing measurements: results coming out in the next 
year will have total statistical error of ~ 5%, so the reduction 
in the systematic error is necessary to ensure that it does 
not exceed the statistical error. 

For the three new photoz methods tested here, we have 
measured the lensing calibration bias using a statistic bz 
(Eq. O which is optimized for characterization of photoz's 
for galaxy-galaxy lensing purposes. Another statistic, in 
Eq. ll7l can be used to determine how much a photoz method 
causes a deviation from optimal weighting, affecting the sta- 
tistical error of the measurement. We have also carefully 
identified important aspects of the photoz error distribu- 
tion. We found that for our source sample, using the SDSS 
template photoz's (without any corrections for mean photoz 
bias) led to the smallest lensing calibration bias. This re- 
sult is due to a fortuitous cancellation of lensing calibration 
biases due to photoz bias and scatter, and would not neces- 
sarily happen with a sample with different selection criteria. 
While for some applications, the presence of a failure mode 
that sends sources to zero redshift would be quite problem- 
atic, it does not cause any bias for lensing (though as we 



© 0000 RAS, MNRAS 000, 000-000 



Lensing photoz calibration 27 



have already shown, it leads to increased statistical error on 
the lensing signal). The SDSS neural net photoz's and the 
ZEBRA/SDSS photoz's both cause significant lensing cali- 
bration bias, despite having a reasonable scatter, because of 
a significant positive photoz bias for < z < 0.4. This cal- 
ibration bias can be corrected for after computation of the 
lensing signal using a calibration factor, since our spectro- 
scopic sample has the same selection as the full catalog. If 
the mean photoz bias is corrected for before computing the 
lensing signal, the SDSS neural net photoz's lead to smaller 
lensing calibration bias than the other two new methods, 
implying that the effects of photoz scatter are smaller for 
this method. On some level, once a reliable calibration of 
the photoz's for lensing is known for a given source sam- 
ple, the fact that a photoz method causes calibration bias 
is unimportant: the deterioration of the statistical error due 
to the non-optimal weighting, and the inability to properly 
remove physically-associated sources, are both more impor- 
tant. In that sense, the negative photoz bias of the template 
photoz code, which is the cause of its low lensing calibration 
bias, may in fact be a liability for its practical use. 

We have isolated ways that sampling variance can com- 
plicate the estimation of redshift calibration bias using a 
small subsample of galaxies. Because LSS tends to change 
the fractions of blue and red galaxies, which generally have 
different photoz error distributions, it can bias the estimated 
lensing calibration bias (fez), and can also artificially reduce 
the error. We have verified that our use of two degree-scale 
uncorrelated redshift samples drastically reduces this effect, 
making it negligible for our analysis. 

We have also assessed the level of stellar contamina- 
tion in our source catalog using COSMOS data, and have 
placed stringent limits on the systematic error due to this 
contamination. 

We have tested the use of a full p{z) for estimation of 
the critical surface density, and find that it tends to give su- 
perior results to the use of the photoz alone, with calibration 
biases consistent with zero for all lens redshift distributions 
considered in this paper. Because of this success, we advo- 
cate further work exploring the use of a full p{z) for lensing 
rather than a single photoz for each object. 

We have learned that the details of the photoz bias and 
scatter as a function of redshift are important. For example, 
the mean bias for sources with redshift within Az ~ 0.2 of 
the lenses is more important than the overall mean photoz 
bias. In the extension of this formalism to higher redshift, 
it is important to consider that both the size of the photoz 
error and the derivative d'Ec/dZs determine the redshift cal- 
ibration bias, so deeper surveys that can ensure a larger sep- 
aration between the lenses and sources may find smaller red- 
shift calibration bias even with comparable or larger photoz 
errors than for the methods demonstrated here. However, 
these deeper surveys may have a larger systematic uncer- 
tainty due to spectroscopic redshift failure: our high redshift 
success rate meant that we were not very sensitive to this 
problem, but that high success rate was also a product of 
the relatively bright magnitude of the source sample. 

For deeper surveys with a higher redshift failure rate, 
one can imagine two possible scenarios. The first is that the 
higher failure rate is due to the lower S/N of the spectra. In 
that case, the failure rate as a function of apparent magni- 
tude and colour can be quantified, and included as a weight 



in the lensing calibration bias calculation. We would assume 
that for a given magnitude and colour the redshift distribu- 
tion is properly being sampled despite redshift failure, so we 
upweight those in regions of parameter space where failure 
is more likely. The second case is more pernicious: if there is 
a region of colour and magnitude space for which essentially 
all the redshifts are failures, then no amount of reweighting 
will be able to account for this. Consequently, for proper 
redshift calibration, one would need to either remove those 
sources entirely due to the impossibility of calibration, or 
get external information from some other spectrograph that 
is capable of obtaining redshifts for that region of colour 
space. 

In summary, the results in this work resoundingly ver- 
ify our claim that the spectroscopic sample used to assess 
photoz error for lensing purposes must have the same selec- 
tion as the source catalog, or selection close enough that it 
can be made comparable by a reweighting scheme (see Sec- 
tion l4.4p . The photoz error is a strong function of galaxy type 
and apparent magnitude, and the lensing calibration is a 
very sensitive to details of the photoz error distribution. We 
have also shown that at least two independent degree-scale 
patches of the sky must be surveyed in order to suppress the 
sampling variance effects on photoz calibration (this choice 
would have to be re-evaluated for deeper surveys, as would 
our choice of redshift histogram bins Az = 0.05). Having 
two independent spectroscopic surveys, DEEP2 and zCOS- 
MOS, with nearly 3000 galaxies in total, allowed us to pro- 
vide photoz calibration of the galaxy-galaxy lensing signal 
at a percent level, depending on the lens sample. As more 
spectroscopic redshift surveys become available, it will be- 
come easier for weak lensing measurements to be carried out 
with tight constraints on the redshift calibration bias using 
this method. This is one more important step on the way to- 
wards galaxy-galaxy lensing becoming a high-precision tool 
for addressing questions of astrophysical and cosmological 
importance. Similar calibration methods must be developed 
and applied also to other weak lensing applications, most 
notably galaxy-galaxy lensing in the case where lens red- 
shifts are not known, and shear-shear autocorrelations; we 
discuss the steps that would be needed for such a process in 
Appendix 1X1 
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APPENDIX A: EXTENSION TO OTHER 
LENSING MEASUREMENTS 

In this paper we have demonstrated the lensing calibration 
using SDSS g-g lensing data with lens redshifts. Here we dis- 
cuss the extension of this analysis to other lensing scenarios, 
particularly 

(i) Galaxy-galaxy lensing with lens photoz's instead of 
spectroscopic redshifts; 

(ii) Galaxy-galaxy lensing with redshift distributions for 
both lenses and sources; and 

(ill) Gosmic shear (shear-shear autocorrelations) with 
photoz's or redshift distributions for the source sample. 

We discuss the first case on its own, and the second and 
third together. 

Al G-g lensing with lens photoz's 

The first case, g-g lensing with photoz's for the lenses, in- 
volves the same lensing formalism as for g-g lensing with 
spectroscopic redshifts. We simply require an additional 
spectroscopic calibration sample for the lenses to trace their 
photoz error distribution. However, in addition to the mul- 
tiplicative calibration bias 6^ (Eqs.[5]and[6ll which will now 
include contributions from the lens photoz error distribu- 
tion, the increased variance due to non-optimal weighting 
(Eq. I17p , and the systematic calibration uncertainty to the 
sampling variance in the calibration sample, there is one 
additional effect to consider. 

The conversion to transverse separation ii, used to bin 
the stacked sources for comparison against theoretical pre- 
dictions, depends on the lens redshift. In our formalism, 
which uses comoving coordinates, R — 6isDA{zt)(l + zi), 
where 6is is the angular separation between the lens and 
source in radians. When using photoz's for lenses, we can 
define an estimated separation R determined using the^lens 
photoz. Consequently, the measured lensing signal AE(i?) 
can be expressed as an integral over the photoz error distri- 
bution: 

roo 

AE(^) = / AT,{R)pL{R\R)dR (Al) 
Jo 

where pl{R\R) represents the probability, given the lens 
photoz error distribution, that a source at separation R will 
be put at estimated separation R. This probability can be 
obtained trivially from the lens photoz error distribution ex- 
pressed as pl{zp\z) using the transformation from redshift 
to transverse separation and the derivative dR/dz. Even for 
relatively simple models for AE and pL{zp\z) (e.g., power- 
law and Gaussian, respectively) this integral does not reduce 
to a simple analytic expression. 

Note that this effect is more pernicious in some ways 
than a pure calibration error, since the effect depends on 
the scale-dependence of the true lensing signal AE. This 
error must be treated differently than a pure calibration 
error: rather than changing the computation of the signal 
by incorporating a calibration factor, this error must be in- 
corporated at the interpretation step of the analysis, when 
some model is used to predict AE. At that stage, the ad- 
ditional step of numerically convolving the prediction with 
PLiR\R) can be included before comparing against the data. 
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The convolution will change the prediction, and also induce 
some theoretical uncertainty depending on the statistical + 
sampling variance uncertainty on pl{R\R)- That theoreti- 
cal uncertainty in the model prediction can be determined 
by using pl{R\R) from many realizations of the data to get 
AE(^) and fit for the model parameters on each realization. 

A2 Redshift distributions for g-g lensing and 
cosmic shear 

The case of galaxy-galaxy lensing with a redshift distribu- 
tion used for both lenses and sources, and the case of cos- 
mic shear, are similar in several important aspects. In both 
cases, the observed sigual is typically expressed as a func- 
tion of shears as a function of angular separation (angle d or 
multipole I) . Most work either does not incorporate redshift 
information, or uses tomographic cosmic shear in which the 
photoz's are used to separate the source sample into several 
bins, with shear-shear autocorrelation functions measured in 
each bin (and cross-correlation functions measured between 
bins). The full redshift information (diV/dz, or dN/dz for 
each bin) is then incorporated at the interpretation stage 
of the analysis, when a model for the signal (i.e., AE(_R) in 
case 2 or the convergence power spectrum in case 3) is trans- 
formed to the form of the observable to fit for the model 
parameters. In general, errors in the redshift distributions 
can lead to nontrivial changes in this prediction - not pure 
calibration bias, but some change with scale dependence. 
The choice of the wrong redshift distribution therefore leads 
to the selection of the wrong model parameters because the 
theoretical predictions have been computed in the wrong 
way. Here we assume that a spectroscopic training sample is 
being used to obtain the proper source redshift distribution 
in the mean, but we would like to determine the uncertainty 
in the model parameters due to Poisson + sampling variance 
uncertainty in the source redshift distribution. 

In practice, this uncertainty can be trivially included in 
the analysis using modifications of the procedures described 
for galaxy-galaxy lensing with lens redshifts. For example, 
for g-g lensing without lens or source redshift, one can use 
spectroscopic training samples with the same selection as 
the lens and source samples to create redshift histograms 
and fit them to some functional form for many bootstrap 
resamplings of the redshift histogram pairs {zi,Ni). One can 
then generate the theoretical prediction for each of the many 
realizations of the best-fit redshift histogram, and fit for the 
model parameters on each one to see how much they vary 
due to the changes in the redshift histogram from realiza- 
tion to realization. For cosmic shear, this procedure can be 
adopted using a single spectroscopic calibration sample that 
is comparable to the source sample. The Poisson and LSS 
uncertainty in the redshift histograms will therefore be prop- 
agated to uncertainties on the model parameters. 



